9    Managing File System Performance

To tune for better file system performance, you must understand how your applications and users perform disk I/O, as described in Section 2.1 and how the file system you are using shares memory with processes, as described in Chapter 6. Using this information, you might improve file system performance by changing the value of the kernel subsystem attributes described in this chapter.

This chapter describes how to tune:

9.1    Tuning Caches

The kernel caches (temporarily stores) in memory recently accessed data. Caching data is effective because data is frequently reused and it is much faster to retrieve data from memory than from disk. When the kernel requires data, it checks if the data was cached. If the data was cached, it is returned immediately. If the data was not cached, it is retrieved from disk and cached. File system performance is improved if data is cached and later reused.

Data found in a cache is called a cache hit, and the effectiveness of cached data is measured by a cache hit rate. Data that was not found in a cache is called a cache miss.

Cached data can be information about a file, user or application data, or metadata, which is data that describes an object for example, a file. The following list identifies the types of data that are cached:

9.1.1    Tuning the namei Cache

The Virtual File System (VFS) presents to applications a uniform kernel interface that is abstracted from the subordinate file system layer. As a result, file access across different types of file systems is transparent to the user.

The VFS uses a structure called a vnode to store information about each open file in a mounted file system. If an application makes a read or write request on a file, VFS uses the vnode information to convert the request and direct it to the appropriate file system. For example, if an application makes a read() system call request on a file, VFS uses the vnode information to convert the system call to the appropriate type for the file system containing the file: ufs_read() for UFS, advfs_read() for AdvFS, or nfs_read() call if the file is in a file system mounted through NFS, then directs the request to the appropriate file system.

The VFS caches a recently accessed file name and its corresponding vnode in the namei cache. File system performance is improved if a file is reused and its name and corresponding vnode are in the namei cache.

Related Attributes

The following list describes the vfs subsystem attributes that relate to the namei cache:

Note

If you use increase the values of namei cache related attributes, consider increasing file system attributes that cache file and directory information. If you use AdvFS, see Section 9.1.4 for more information. If you use UFS, see Section 9.1.3 for more information.

When to Tune

You can check namei cache statistics to see if you should change the values of namei cache related attributes. To check namei cache statistics, enter the dbx print command and specify a processor number to examine the nchstats data structure, for example:

# /usr/ucb/dbx -k /vmunix /dev/mem 
(dbx) print processor_ptr[0].nchstats
 
 

Information similar to the following is displayed:

struct {
        ncs_goodhits = 18984
        ncs_neghits = 358
        ncs_badhits = 113
        ncs_falsehits = 23
        ncs_miss = 699
        ncs_long = 21
        ncs_badtimehits = 33
        ncs_collisions = 2
        ncs_unequaldups = 0
        ncs_newentry = 697
        ncs_newnegentry =  419
        ncs_gnn_hit = 1653
        ncs_gnn_miss = 12
        ncs_gnn_badhits = 12
        ncs_gnn_collision = 4
        ncs_pad = {
            [0] 0
        }
} 
 
 

The following table describes when you might change the values namei cache related attributes based on the dbx print output:

If Increase

The value of ncs_goodhits + ncs_neghits / ncs_goodhits + ncs_neghits + ncs_miss + ncs_falsehits is less than 80 percent

The value of either the maxusers attribute or the name_cache_hash_size attribute
The value of the ncs_badtimehits is more than 0.1 percent of the ncs_goodhits The value of the namei_cache_valid_time attribute and the vnode_age attribute

You cannot modify the values of the name_cache_hash_size attribute, the namei_cache_valid_time attribute, or the vnode_deallocation_enable attribute without rebooting the system. You can modify the value of the vnode_age attribute without rebooting the system. See Section 3.6 for information about modifying subsystem attributes.

9.1.2    Tuning the UBC

The Unified Buffer Cache (UBC) shares with processes the memory that is not wired to cache UFS user and application data and AdvFS user and application data and metadata. File system performance is improved if the data and metadata is reused and in the UBC.

Related Attributes

The following list describes the vm subsystem attributes that relate to the UBC:

Note

If the values of the ubc_maxpercent and ubc_minpercent attributes are close, you may degrade file system performance.

When to Tune

An insufficient amount of memory allocated to the UBC can impair file system performance. Because the UBC and processes share memory, changing the values of UBC related attributes might cause the system to page. You can use the vmstat command to display virtual memory statistics that will help you to determine if you need to change values of UBC related attributes. The following table describes when you might change the values UBC related attributes based on the vmstat output:

If vmstat Output Displays Excessive: Action:
Paging but few or no page outs

Increase the value of the ubc_borrowpercent attribute.

Paging and swapping Decrease the ubc_maxpercent attribute.
Paging Force the system to reuse pages in the UBC instead of from the free list by making the value of the ubc_maxpercent attribute greater than the value of the vm_ubseqstartpercent attribute, which it is by default, and that the value of the vm_ubcseqpercent attribute is greater than a referenced file.
Page outs Increase the value of the ubc_minpercent attribute.

See Section 6.3.1 for information on the vmstat command. See Section 6.1.2.2 for information about UBC memory allocation.

You can modify the value of any of the UBC parameters described in this section without rebooting the system. See Section 3.6 for information about modifying subsystem attributes.

Note

The performance of an application that generates a lot of random I/O is not improved by a large UBC, because the next access location for random I/O cannot be predetermined.

9.1.3    Tuning the Metadata Buffer Cache

At boot time, the kernel wires a percentage of memory for the metadata buffer cache. UFS file metadata, such as superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries are cached in the metadata buffer cache. File system performance is improved if the metadata is reused and in the metadata buffer cache.

Related Attributes

The following list describes the vfs subsystem attributes that relate to the metadata buffer cache:

You cannot modify the values of the buffer_hash_size attribute or the bufcache attribute without rebooting the system. See Section 3.6 for information about modifying kernel subsystem attributes.

When to Tune

Consider increasing the size of the bufcache attribute if you have a high cache miss rate (low hit rate).

To determine if you have a high cache miss rate, use the dbx print command to display the bio_stats data structure. If the miss rate (block misses divided by the sum of the block misses and block hits) is more than 3 percent, consider increasing the value of the bufcache attribute. See Section 9.3.2.3 for more information on displaying the bio_stats data structure. Note that increasing the value of the bufcache attribute will reduce the amount of memory available to processes and the UBC.

9.1.4    Tuning AdvFS Access Structures

At boot time, the system reserves a portion of the physical memory that is not wired by the kernel for AdvFS access structures. AdvFS caches information about open files and information about files that were opened but are now closed in AdvFS access structures. File system performance is improved if the file information is reused and in an access structure.

AdvFS access structures are dynamically allocated and deallocated according to the kernel configuration and system demands.

Related Attribute

The AdvfsAccessMaxPercent attribute specifies, as a percentage, the maximum amount of pageable memory that can be allocated for AdvFS access structures.

Value: 5 to 95

Default value: 25 percent

You can modify the value of the AdvfsAccessMaxPercent attribute without rebooting the system. See Section 3.6 for information about modifying kernel subsystem attributes.

When to Tune

If users or applications reuse AdvFS files (for example, a proxy server), consider increasing the value of the AdvfsAccessMaxPercent attribute to allocate more memory for AdvFS access structures. Note that increasing the value of the AdvfsAccessMaxPercent attribute reduces the amount of memory available to processes and might cause excessive paging and swapping. You can use the vmstat command to display virtual memory statistics that will help you to determine excessive paging and swapping. See Section 6.3.1 for information on the vmstat command

Consider decreasing the amount of memory reserved for AdvFS access structures if:

9.2    Tuning AdvFS

This section describes how tune Advanced File System (AdvFS) queues, AdvFS configuration guidelines, and commands that you can use to display AdvFS information.

See the AdvFS Administration manual for information about AdvFS features and setting up and managing AdvFS.

9.2.1    Tuning AdvFS Queues

For each AdvFS volume, I/O requests are sent to one of the following queues:

All three queues (blocking, flush, and lazy) move buffers to the device queue. As buffers are moved onto the device queue, logically contiguous I/Os are consolidated into larger I/O requests. This reduces the actual number of I/Os that must be completed. Buffers on the device queue cannot be modified until their I/O has completed.

The algorithms that move the buffers onto the device queue favor taking buffers from the blocking queue over the flush queue, and both are favored over the lazy queue. The size of the device queue is limited by device and driver resources. The algorithms that load the device queue use feedback from the drivers to know when the device queue is full. At that point the device is saturated and continued movement of buffers to the device queue would only degrade throughput to the device. The potential size of the device queue and how full it is, ultimately determines how long it may take to complete a synchronous I/O operation.

Figure 9-1 shows the movement of synchronous and asynchronous I/O requests through the AdvFS I/O queues.

Figure 9-1:  AdvFS I/O Queues

Detailed descriptions of the AdvFS lazy queues are as follows:

Related Attributes

The following list describes the vfs subsystem attributes that relate to AdvFS queues:

You can modify the value of the AdvfsSyncMmapPages attribute and the AdvfsReadyQLim attribute without rebooting the system. See Section 3.6 for information about modifying kernel subsystem attributes.

When to Tune

If you reuse data, consider increasing:

9.2.2    AdvFS Configuration Guidelines

The amount of I/O contention on the volumes in a file domain is the most critical factor for fileset performance. This can occur on large, very busy file domains. To help you determine how to set up filesets, first identify:

Then, use the previous information and the following guidelines to configure filesets and file domains:

Table 9-1 lists additional AdvFS configuration guidelines and performance benefits and tradeoffs. See the AdvFS Administration manual for more information about AdvFS.

Table 9-1:  AdvFS Configuration Guidelines

Benefit Guideline Tradeoff
Data loss protection Use LSM or RAID to store data using RAID 1 (mirror data) or RAID 5 (Section 9.2.2.1) Requires LSM or RAID
Data loss protection Force synchronous writes or enable atomic write data logging on a file(Section 9.2.2.2) Might degrade file system performance
Improve performance for applications that read or write data only once Enable direct I/O (Section 9.2.2.3) Degrades performance of application that repeatedly acccess the same data
Improve performance Use AdvFS to distribute files in a file domain (Section 9.2.2.4) None
Improve performance Stripe data (Section 9.2.2.5) None if using AdvFS or requires LSM or RAID
Improve performance Defragment file domains (Section 9.2.2.6) None
Improve performance Decrease the I/O transfer size (Section 9.2.2.7) None
Improves performance Move the transaction log to a fast or uncongsted disk (Section 9.2.2.8) Might require an additional disk

9.2.2.1    Storing Data Using RAID 1 or RAID 5

You can use LSM or hardware RAID to implement a RAID 1 or RAID 5 data storage configuration.

In a RAID 1 configuration LSM or hardware RAID stores and maintain mirrors (copies) of file domain or transaction log data on different disks. If a disk fails, LSM or hardware RAID uses a mirror to make the data available.

In a RAID 5 configuration LSM or hardware RAID stores parity information and data. If a disk fails, LSM or hardware RAID use the parity information and data on the remaining disks to reconstruct the missing data.

See the Logical Storage Manager manual for more information about LSM. See your storage hardware documentation for more information about hardware RAID.

9.2.2.2    Forcing a Synchronous Write Request or Enabling Atomic Write Data Logging

AdvFS writes data to disk in 8-KB units. By default, AdvFS asynchronous write requests are cached in the UBC, and the write system call returns a success value. The data is written to disk at a later time (asynchronously). AdvFS does not guarantee that all or part of the data will actually be written to disk if a crash occurs during or immediately after the write. For example, if the system crashes during a write that consists of two 8-KB units of data, only a portion (less than 16 KB) of the total write might have succeeded. This can result in partial data writes and inconsistent data.

You can configure AdvFS to force the write request for a specified file to be synchronous to ensure that data is successfully written to disk before the write system call returns a success value.

Enabling atomic write data logging for a specified file writes the data to the transaction log file before it is written to disk. If a system crash occurs during or immediately after the write system call, the data in the log file is used to reconstruct the write system call upon recovery.

You cannot enable both forced synchronous writes and atomic write data logging on a file. However, you can enable atomic write data logging on a file and also open the file with an O_SYNC option. This ensures that the write is synchronous, but also prevents partial writes if a crash occurs before the write system call returns.

To force synchronous write requests, enter:

# chfile -l on filename

A file that has atomic write data logging enabled cannot be memory mapped by using the mmap system call, and it cannot have direct I/O enabled (see Section 9.2.2.3). To enable atomic write data logging, enter:

# chfile -L on filename

To enable atomic write data logging on AdvFS files that are NFS mounted, ensure that:

9.2.2.3    Enabling Direct I/O

You can enable direct I/O to significantly improve disk I/O throughput for applications that do not frequently reuse previously accessed data. The following lists considerations if you enable direct I/O:

You cannot enable direct I/O for a file if it is already opened for data-logging or if it is memory mapped. Use the fcntl system call with the F_GETCACHEPOLICY argument to determine if an open file has direct I/O enabled.

To enable direct I/O for a specific file, use the open system call and set the O_DIRECTIO file access flag. A file is opened for direct I/O until all users close the file.

See fcntl(2), open(2), AdvFS Administration, and the Programmer's Guide for more information.

9.2.2.4    Using AdvFS to Distribute Files

If the files in a multivolume domain are not evenly distributed, performance might be degraded. You can distribute space evenly across volumes in a multivolume file domain to balance the percentage of used space among volumes in a domain. Files are moved from one volume to another until the percentage of used space on each volume in the domain is as equal as possible.

To volume information to determine if you need to balance files, enter:

# showfdmn file_domain_name

Information similar to the following is displayed:

               Id     Date Created       LogPgs Version   Domain Name
3437d34d.000ca710  Sun Oct 5 10:50:05 1999  512       3   usr_domain
 Vol  512-Blks   Free % Used  Cmode Rblks  Wblks  Vol Name 
  1L   1488716 549232    63%     on   128    128  /dev/disk/dsk0g
  2     262144 262000     0%     on   128    128  /dev/disk/dsk4a
     --------- -------  ------
       1750860 811232    54%

The % Used field shows the percentage of volume space that is currently allocated to files or metadata (the fileset data structure). In the pevious example, the usr_domain file domain is not balanced. Volume 1 has 63 percent used space while volume 2 has 0 percent used space (it was just added).

To distribute the percentage of used space evenly across volumes in a multivolume file domain, enter:

# balance file_domain_name

The balance command is transparent to users and applications and does not affect data availability or split files. Therefore, file domains with very large files may not balance as evenly as file domains with smaller files and you might need to move large files on the same volume in a multivolume file domain.

To determine if you should move a file, enter:

# showfile -x file_name

Information similar to the following is displayed:

    Id Vol PgSz Pages XtntType  Segs  SegSz  I/O  Perf  File
8.8002   1   16    11   simple    **     ** async  18%  src
 
             extentMap: 1
        pageOff    pageCnt     vol    volBlock    blockCnt
              0          1       1      187296          16
              1          1       1      187328          16
              2          1       1      187264          16
              3          1       1      187184          16
              4          1       1      187216          16
              5          1       1      187312          16
              6          1       1      187280          16
              7          1       1      187248          16
              8          1       1      187344          16
              9          1       1      187200          16
             10          1       1      187232          16
        extentCnt: 11

The file in the previous example is a good candidate to move to another volume because it has 11 extents and an 18 percent performance efficiency as shown in the Perf field. A high percentage indicates optimal efficiency.

To move a file to a different volumes in the file domain, enter:

# migrate [-p pageoffset] [-n pagecount] [-s volumeindex_from] \
[-d volumeindex_to] file_name

You can specify the volume from which and to which a file is to be moved, or allow the system to pick the best space in the file domain. You can move either an entire file or specific pages to a different volume.

Note that using the balance utility after moving files might move files to a different volume.

See showfdmn(8), migrate(8), and balance(8) for more information.

9.2.2.5    Striping Data

You can use AdvFS, LSM, or hardware RAID to stripe (distribute) data. Striped data is data that is separated into units of equal size, then written to two or more disks, creating a stripe of data. The data can be simultaneously written if there are two or more units and the disks are on different SCSI buses.

Figure 9-2 shows how a write request of 384-KB of data is separated into six 64-KB data units and written to 3 disks as two complete stripes.

Figure 9-2:  Striping Data

In general, you should use only one method to stripe data. In some specific cases using multiple striping methods can improve performance but only if:

See stripe(8) for more information about using AdvFS to stripe data. See the Logical Storage Manager manual for more information about using LSM to stripe data. See your storage hardware documentation for more information about using hardware RAID to stripe data.

9.2.2.6    Defragmenting a File Domain

An extent is a contiguous area of disk space that AdvFS allocates to a file. Extents consist of one or more 8-KB pages. When storage is added to a file, it is grouped in extents. If all data in a file is stored in contiguous blocks, the file has one file extent. However, as files grow, contiguous blocks on the disk may not be available to accommodate the new data, so the file must be spread over discontiguous blocks and multiple file extents.

File I/O is most efficient when there are few extents. If a file consists of many small extents, AdvFS requires more I/O processing to read or write the file. Disk fragmentation can result in many extents and may degrade read and write performance because many disk addresses must be examined to access a file. In addition, if a domain has a large number of small files, you may prematurely run out of disk space due to fragmentation.

To display fragmentation information for a file domain, enter:

# defragment -vn file_domain_name

Information similar to the following is displayed:

 defragment: Gathering data for 'staff_dmn'
 Current domain data:
   Extents:                 263675
   Files w/ extents:        152693
   Avg exts per file w/exts:  1.73
   Aggregate I/O perf:         70%
   Free space fragments:     85574
                <100K   <1M   <10M   >10M
   Free space:   34%   45%    19%     2%
   Fragments:  76197  8930    440      7
 
 

Ideally, you want few extents for each file.

Although the defragment command does not affect data availability and is transparent to users and applications, it can be a time-consuming process and requires disk space. You should run the defragment command during low file system activity as part of regular file system maintenance or if you experience problems because of excessive fragmentation.

There is little performance benefit from defragmenting a file domain that contains files less than 8 KB, is used in a mail server, or is read-only.

You can also use the showfile command to check a file's fragmentation. See Section 9.2.3.4 for information.

See defragment(8) for more information.

9.2.2.7    Decreasing the I/O Transfer Size

AdvFS attempts to transfer data to and from the disk in sizes that are the most efficient for the device driver. This value is provided by the device driver and is called the preferred transfer size. AdvFS uses the preferred transfer size to:

Generally, the I/O transfer size provided by the device driver is the most efficient. However, in some cases you may want to reduce the AdvFS I/O transfer size. For example, if your AdvFS fileset is using LSM volumes, the preferred transfer size might be very high. This could cause the cache to be unduly diluted by the buffers for the files being read. If this is suspected, reducing the read transfer size may alleviate the problem.

For systems with impaired mmap page faulting or with limited memory, you should limit the read transfer size to limit the amount of data that is prefetched; however, this will limit I/O consolidation for all reads from this disk.

To display the I/O transfer sizes for a disk, enter:

# chvol -l block_special_device_name domain

To modify the read I/O transfer size, enter:

# chvol -r blocks block_special_device_name domain

To modify the write I/O transfer size, enter:

# chvol -w blocks block_special_device_name domain

See chvol(8) for more information.

Each device driver has a minimum and maximum value for the I/O transfer size. If you use an unsupported value, the device driver automatically limits the value to either the largest or smallest I/O transfer size it supports. See your device driver documentation for more information on supported I/O transfer sizes.

9.2.2.8    Moving the Transaction Log

The AdvFS transaction log should be located on a fast or uncongested disk and bus; otherwise, performance might be degraded.

To display volume information, enter:

# showfdmn file_domain_name

Information similar to the following is displayed:

               Id              Date Created  LogPgs  Domain Name
35ab99b6.000e65d2  Tue Jul 14 13:47:34 1998     512  staff_dmn
 
  Vol   512-Blks        Free  % Used  Cmode  Rblks  Wblks  Vol Name
   3L     262144      154512     41%     on    256    256  /dev/rz13a
   4      786432      452656     42%     on    256    256  /dev/rz13b
      ----------  ----------  ------
         1048576      607168     42%

In the showfdmn command display, the letter L displays next to the volume that contains the transaction log.

If the transaction log is located on a slow or busy disk, you can:

See showfdmn(8), switchlog(8), vdump(8), and vrestore(8) for more information.

9.2.3    Displaying AdvFS Information

Table 9-2 describes the commands you can use to display AdvFS information.

Table 9-2:  Commands to Display AdvFS Information

To Display Command

AdvFS performance statistics (Section 9.2.3.1)

advfsstat

Disks in a file domain (Section 9.2.3.2)

advscan

Information about AdvFS file domains and volumes (Section 9.2.3.3)

showfdmn

AdvFS fileset information for a file domain (Section 9.2.3.5)

showfsets

Information about files in an AdvFS fileset (Section 9.2.3.4)

showfile

A formatted page of the BMT (Section 9.2.3.6)

vbmtpg

9.2.3.1    Displaying AdvFS Performance Statistics

To display detailed information about a file domain, including use of the UBC and namei cache, fileset vnode operations, locks, bitfile metadata table (BMT) statistics, and volume I/O performance, enter:

# advfsstat -v [-i number_of_seconds] file_domain

Information, in units of one disk block (512 bytes), similar to the following is displayed:

vol1   rd  wr  rg  arg  wg  awg  blk  flsh  wlz  sms rlz  con  dev   
       54   0  48  128   0    0    0    0    1    0   0    0   65

You can use the -i option to display information at specific time intervals, in seconds.

The previous example displays:

To display the number of file creates, reads, and writes and other operations for a specified domain or filese, enter:

# advfsstat [-i number_of_seconds] -f 2 number file_domain file_set

Information similar to the following is displayed:

  lkup  crt geta read writ fsnc dsnc   rm   mv rdir  mkd  rmd link
     0    0    0    0    0    0    0    0    0    0    0    0    0
     4    0   10    0    0    0    0    2    0    2    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0
    24    8   51    0    9    0    0    3    0    0    4    0    0
  1201  324 2985    0  601    0    0  300    0    0    0    0    0
  1275  296 3225    0  655    0    0  281    0    0    0    0    0
  1217  305 3014    0  596    0    0  317    0    0    0    0    0
  1249  304 3166    0  643    0    0  292    0    0    0    0    0
  1175  289 2985    0  601    0    0  299    0    0    0    0    0
   779  148 1743    0  260    0    0  182    0   47    0    4    0
     0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0

The following table describes the headings in the previous example:

Heading Displays Number Of

lkup crtgeta readwritfsncdsncrmmvrdirmkdrmdlink

file lookups file createsget attributesfile readsfile writes file syncs data syncs file removesfile renameddirectory reads make directories remove directorieslinks created

See advfsstat(8) for more information.

9.2.3.2    Displaying Disks in an AdvFS File Domain

Use the advscan command:

To display AdvFS volumes on devices or in an LSM disk group, enter:

# advscan device | LSM_disk_group

Information similar to the following is displayed:

Scanning disks  dsk0 dsk5 
Found domains: 
usr_domain
          Domain Id       2e09be37.0002eb40                 
          Created         Thu Jun 26 09:54:15 1998                 
          Domain volumes          2
          /etc/fdmns links        2                 
          Actual partitions found:
                                  dsk0c                     
                                  dsk5c

To recreate missing domains on a device, enter:

# advscan -r device

Information similar to the following is displayed:

Scanning disks  dsk6 
Found domains: *unknown*      
          Domain Id       2f2421ba.0008c1c0                 
          Created         Mon Jan 20 13:38:02 1998                   
          Domain volumes          1   
          /etc/fdmns links        0                   
          Actual partitions found:                                         
                                  dsk6a*    
*unknown*       
         Domain Id       2f535f8c.000b6860                 
         Created         Tue Feb 25 09:38:20 1998                   
         Domain volumes          1    
         /etc/fdmns links        0                   
         Actual partitions found:
                                 dsk6b*    
 
Creating /etc/fdmns/domain_dsk6a/
        linking dsk6a   
Creating /etc/fdmns/domain_dsk6b/         
        linking dsk6b

See advscan(8) for more information.

9.2.3.3    Displaying AdvFS File Domains

To display information about a file domain, including the date created and the size and location of the transaction log, and information about each volume in the domain, including the size, the number of free blocks, the maximum number of blocks read and written at one time, and the device special file, enter:

# showfdmn file_domain

Information similar to the following is displayed:

               Id              Date Created  LogPgs  Version  Domain Name
34f0ce64.0004f2e0  Wed Mar 17 15:19:48 1999     512        4  root_domain
 
  Vol   512-Blks        Free  % Used  Cmode  Rblks  Wblks  Vol Name 
   1L     262144       94896     64%     on    256    256  /dev/disk/dsk0a
 
 

For multivolume domains, the showfdmn command also displays the total volume size, the total number of free blocks, and the total percentage of volume space currently allocated.

See showfdmn(8) for more information about the output of the command.

9.2.3.4    Displaying AdvFS File Information

To display detailed information about files (and directories) in an AdvFS fileset, enter:

# showfile * | file name

The * displays the AdvFS characteristics for all of the files in the current working directory.

Information similar to the following is displayed:

         Id  Vol  PgSz  Pages  XtntType  Segs  SegSz  I/O   Perf  File
  23c1.8001    1    16      1    simple    **     **  ftx   100%  OV
  58ba.8004    1    16      1    simple    **     **  ftx   100%  TT_DB
         **   **    **     **   symlink    **     **   **     **  adm
  239f.8001    1    16      1    simple    **     **  ftx   100%  advfs
         **   **    **     **   symlink    **     **   **     **  archive
     9.8001    1    16      2    simple    **     **  ftx   100%  bin (index)
         **   **    **     **   symlink    **     **   **     **  bsd
         **   **    **     **   symlink    **     **   **     **  dict
   288.8001    1    16      1    simple    **     **  ftx   100%  doc
   28a.8001    1    16      1    simple    **     **  ftx   100%  dt
         **   **    **     **   symlink    **     **   **     **  man
  5ad4.8001    1    16      1    simple    **     **  ftx   100%  net
         **   **    **     **   symlink    **     **   **     **  news
   3e1.8001    1    16      1    simple    **     **  ftx   100%  opt
         **   **    **     **   symlink    **     **   **     **  preserve
         **   **    **     **     advfs    **     **   **     **  quota.group
         **   **    **     **     advfs    **     **   **     **  quota.user
     b.8001    1    16      2    simple    **     **  ftx   100%  sbin (index)
         **   **    **     **   symlink    **     **   **     **  sde
   61d.8001    1    16      1    simple    **     **  ftx   100%  tcb
         **   **    **     **   symlink    **     **   **     **  tmp
         **   **    **     **   symlink    **     **   **     **  ucb
  6df8.8001    1    16      1    simple    **     **  ftx   100%  users

The following table describes the headings in the previous example:

Heading Displays Number Of

Id

The unique number (in hexadecimal format) that identifies the file. Digits to the left of the dot (.) character are equivalent to a UFS inode.

Vol

The location of primary metadata for the file, expressed as a number. The data extents of the file can reside on another volume.

PgSz

The page size in 512-byte blocks.

Pages

The number of pages allocated to the file.

XtntType

The extent type can be simple, which is a regular AdvFS file without special extents; stripe, which is a striped file; symlink, which is a symbolic link to a file; usf, nfsv3, and so on.

The showfile command cannot display attributes for symbolic links or non-AdvFS files.

Segs

The number of stripe segments per striped file, which is the number of volumes a striped file crosses. (Applies only to stripe type.)

SegSz

The number of pages per stripe segment. (Applies only to stripe type.)

I/O

The type of write requests to this file. If async, write requests are buffered (the AdvFS default). If sync, forced synchronous writes. If ftx, write requests executed under AdvFS transaction control, which is reserved for metadata files and directories.

Perf

The efficiency of file-extent allocation, expressed as a percentage of the optimal extent layout. A high percentage indicates that the AdvFS I/O system has achieved optimal efficiency. A low percentage indicates the need for file defragmentation.

See showfile(8) for more information about the command output.

9.2.3.5    Displaying the AdvFS Filesets in a File Domain

To display information about the filesets in a file domain, including the fileset names, the total number of files, the number of used blocks, the quota status, and the clone status, enter:

# showfsets file_domain

Information similar to the following is displayed:

mnt
  Id           : 2c73e2f9.000f143a.1.8001
  Clone is     : mnt_clone
  Files        :     7456,  SLim= 60000, HLim=80000  
  Blocks  (1k) :   388698,  SLim= 6000,  HLim=8000  
  Quota Status : user=on  group=on
 
mnt_clone
  Id           : 2c73e2f9.000f143a.2.8001
  Clone of     : mnt          
  Revision     : 2

The previous example displays that a file domain called dmn1 has one fileset and one clone fileset.

See showfsets(8) for information.

9.2.3.6    Displaying the Bitmap Metadata Table

The AdvFS fileset data structure (metadata) is stored in a file called the bitfile metadata table (BMT). Each volume in a domain has a BMT that describes the file extents on the volume. If a domain has multiple volumes of the same size, files will be distributed evenly among the volumes.

The BMT is the equivalent of the UFS inode table. However, the UFS inode table is statically allocated, while the BMT expands as more files are added to the domain. Each time AdvFS needs additional metadata, the BMT grows by a fixed size (the default is 128 pages). As a volume becomes increasingly fragmented, the size by which the BMT grows might be described by several extents.

To display a formatted page of the BMT, enter:

# vbmtpg volume

Information similar to the following is displayed:

PAGE LBN    32 megaVersion   0 nextFreePg   0 freeMcellCnt   0 pageId    0
nextfreeMCId page    0 cell    0
==========================================================================
CELL   0   nextVdIndex     0 linkSegment     0   tag,bfSetTag:     0,    0
nextMCId page    0 cell    0
CELL   1   nextVdIndex     0 linkSegment     0   tag,bfSetTag:     0,    0
nextMCId page    0 cell    0
CELL   2   nextVdIndex     0 linkSegment     0   tag,bfSetTag:     0,    0
nextMCId page    0 cell    0
CELL   3   nextVdIndex     0 linkSegment     0   tag,bfSetTag:     0,    0
nextMCId page    0 cell    0
.
.
.
CELL  21   nextVdIndex   267 linkSegment   779   tag,bfSetTag:    10,    0
nextMCId page16787458 cell   16
CELL  22   nextVdIndex  1023 linkSegment     0   tag,bfSetTag: 42096,46480
nextMCId page67126700 cell   16
CELL  23   nextVdIndex     4 linkSegment     0   tag,bfSetTag:-2147483648,    1
nextMCId page    0 cell    1
CELL  24   nextVdIndex     0 linkSegment     0   tag,bfSetTag:332144,    0
nextMCId page  585 cell   16
CELL  25   nextVdIndex 29487 linkSegment 26978   tag,bfSetTag:1684090734,1953325
686
nextMCId page    0 cell    0
==========================================================================
 
RECORD  0 bcnt26739 version105        type 108 *** unknown ***
 
CELL  26   nextVdIndex     0 linkSegment     0   tag,bfSetTag:1879048193,    2
nextMCId page    0 cell    0
CELL  27   nextVdIndex     0 linkSegment     0   tag,bfSetTag:     0, 1023
nextMCId page   31 cell   31

See vbmtpg(8) for more information.

You can also invoke the showfile command and specify mount_point/.tags/M-10 to examine the BMT extents on the first domain volume that contains the fileset mounted on the specified mount point. To examine the extents of the other volumes in the domain, specify M-16, M-24, and so on. If the extents at the end of the BMT are smaller than the extents at the beginning of the file, the BMT is becoming fragmented. See showfile(8) for more information.

9.3    Tuning UFS

This section describes UFS configuration and tuning guidelines and commands that you can use to display UFS information.

9.3.1    UFS Configuration Guidelines

Table 9-3 lists UFS configuration guidelines and performance benefits and tradeoffs.

Table 9-3:  UFS Configuration Guidelines

Benefit Guideline Tradeoff
Improve performance for small files

Make the file system fragment size equal to the block size (Section 9.3.1.1)

Wastes disk space for small files
Improve performance for large files

Use the default file system fragment size of 1 KB (Section 9.3.1.1)

Increases the overhead for large files
Free disk space and improve performance for large files

Reduce the density of inodes on a file system (Section 9.3.1.2)

Reduces the number of files that can be created
Improve performance for disks that do not have a read-ahead cache

Set rotational delay (Section 9.3.1.3)

None
Decrease the number of disk I/O operations

Increase the number of blocks combined for a cluster (Section 9.3.1.4)

None
Improve performance

Use a Memory File System (MFS) (Section 9.3.1.5)

Does not ensure data integrity because of cache volatility
Control disk space usage

Using disk quotas (Section 9.3.1.6)

Might result in a slight increase in reboot time
Allow more mounted file systems

Increase the maximum number of UFS and MFS mounts (Section 9.3.1.7)

Requires addition memory resources

9.3.1.1    Modifying the File System Fragment and Block Sizes

The UFS file system block size is 8 KB. The default fragment size is 1 KB. You can use the newfs command to modify the fragment size to 1024 KB, 2048 KB, 4096 KB, or 8192 KB when you create it.

Although the default fragment size uses disk space efficiently, it increases the overhead for files less than 96 KB. If the average file in a file system is less than 96 KB, you might improve disk access time and decrease system overhead by making the file system fragment size equal to the default block size (8 KB).

See newfs(8) for more information.

9.3.1.2    Reducing the Density of inodes

An inode describes an individual file in the file system. The maximum number of files in a file system depends on the number of inodes and the size of the file system. The system creates an inode for each 4 KB (4096 bytes) of data space in a file system.

If a file system will contain many large files and you are sure that you will not create a file for each 4 KB of space, you can reduce the density of inodes on the file system. This will free disk space for file data, but reduces the number of files that can be created.

To do this, use the newfs -i command to specify the amount of data space allocated for each inode when you create the file system. See newfs(8) for more information.

9.3.1.3    Set Rotational Delay

The UFS rotdelay parameter specifies the time, in milliseconds, to service a transfer completion interrupt and initiate a new transfer on the same disk. It is used to decide how much rotational spacing to place between successive blocks in a file. By default, the rotdelay parameter is set to 0 to allocate blocks continuously. It is useful to set rotdelay on disks that do not have a read-ahead cache. For disks with cache, set the rotdelay to 0.

Use either the tunefs command or the newfs command to modify the rotdelay value. See newfs(8) and tunefs(8) for more information.

9.3.1.4    Increasing the Number of Blocks Combined for a Cluster

The value of the UFS maxcontig parameter specifies the number of blocks that can be combined into a single cluster (or file-block group). The default value of maxcontig is 8. The file system attempts I/O operations in a size that is determined by the value of maxcontig multiplied by the block size (8 KB).

Device drivers that can chain several buffers together in a single transfer should use a maxcontig value that is equal to the maximum chain length. This may reduce the number of disk I/O operations.

Use the tunefs command or the newfs command to change the value of maxcontig. See newfs(8) and tunefs(8) for more information.

9.3.1.5    Using MFS

The Memory File System (MFS) is a UFS file system that resides only in memory. No permanent data or file structures are written to disk. An MFS can improve read/write performance, but it is a volatile cache. The contents of an MFS are lost after a reboot, unmount operation, or power failure.

Because no data is written to disk, an MFS is a very fast file system and can be used to store temporary files or read-only files that are loaded into the file system after it is created. For example, if you are performing a software build that would have to be restarted if it failed, use an MFS to cache the temporary files that are created during the build and reduce the build time.

See mfs(8) for information.

9.3.1.6    Using UFS Disk Quotas

You can specify UFS file system limits for user accounts and for groups by setting up UFS disk quotas, also known as UFS file system quotas. You can apply quotas to file systems to establish a limit on the number of blocks and inodes (or files) that a user account or a group of users can allocate. You can set a separate quota for each user or group of users on each file system.

You may want to set quotas on file systems that contain home directories, because the sizes of these file systems can increase more significantly than other file systems. Do not set quotas on the /tmp file system.

Note that, unlike AdvFS quotas, UFS quotas may cause a slight increase in reboot time. See the AdvFS Administration manual for information about AdvFS quots. See the System Administration manual for information about UFS quotas.

9.3.1.7    Increasing the Number of UFS and MFS Mounts

Mount structures are dynamically allocated when a mount request is made and subsequently deallocated when an unmount request is made.

The max_ufs_mounts attribute specifies the maximum number of UFS and MFS mounts on the system.

Value: 0 to 2,147,483,647

Default value: 1000 (file system mounts)

You can modify the max_ufs_mounts attribute without rebooting the system. See Section 3.6 for information about modifying kernel subsystem attributes.

Increase the maximum number of UFS and MFS mounts if your system will have more than the default limit of 1000 mounts.

Increasing the maximum number of UFS and MFS mounts enables you to mount more file systems. However, increasing the maximum number mounts requires memory resources for the additional mounts.

9.3.2    Displaying UFS Information

Table 9-4 describes the commands you can use to display UFS information.

Table 9-4:  Commands to Display UFS Information

To Dispaly Command

UFS information (Section 9.3.2.1)

dumpfs

UFS clustering statistics (Section 9.3.2.2)

(dbx) print ufs_clusterstats

Metadata buffer cache statistics (Section 9.3.2.3)

(dbx) print bio_stats

9.3.2.1    Displaying UFS Information

To display UFS information for a specified file system, including super block and cylinder group information, enter:

# dumpfs filesystem | /devices/disk/device_name

Information similar to the following is displayed:

 magic   11954   format  dynamic time   Tue Sep 14 15:46:52 1999 
nbfree  21490   ndir    9       nifree  99541  nffree  60 
ncg     65      ncyl    1027    size    409600  blocks  396062
bsize   8192    shift   13      mask    0xffffe000 
fsize   1024    shift   10      mask    0xfffffc00 
frag    8       shift   3       fsbtodb 1 
cpg     16      bpg     798     fpg     6384    ipg     1536 
minfree 10%     optim   time    maxcontig 8     maxbpg  2048 
rotdelay 0ms    headswitch 0us  trackseek 0us   rps     60

The information contained in the first lines are relevant for tuning. Of specific interest are the following fields:

9.3.2.2    Monitoring UFS Clustering

To display how the system is performing cluster read and write transfers, use the dbx print command to examine the ufs_clusterstats data structure. For example:

# /usr/ucb/dbx -k /vmunix /dev/mem  
(dbx) print ufs_clusterstats

Information similar to the following is displayed:

struct {
    full_cluster_transfers = 3130
    part_cluster_transfers = 9786
    non_cluster_transfers = 16833
    sum_cluster_transfers = {
        [0] 0
        [1] 24644
        [2] 1128
        [3] 463
        [4] 202
        [5] 55
        [6] 117
        [7] 36
        [8] 123
        [9] 0
         .
         .
         .
       [33]
 
    }
}
(dbx)

The previous example shows 24644 single-block transfers, 1128 double-block transfers, 463 triple-block transfers, and so on.

You can use the dbx print command to examine cluster reads and writes by specifying the ufs_clusterstats_read and ufs_clusterstats_write data structures respectively.

9.3.2.3    Displaying the Metadata Buffer Cache

To display statistics on the metadata buffer cache, including superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries, use the dbx print command to examine the bio_stats data structure. For example:

# /usr/ucb/dbx -k /vmunix /dev/mem  
(dbx) print bio_stats

Information similar to the following is displayed:

struct {
    getblk_hits = 4590388
    getblk_misses = 17569
    getblk_research = 0
    getblk_dupbuf = 0
    getnewbuf_calls = 17590
    getnewbuf_buflocked = 0
    vflushbuf_lockskips = 0
    mntflushbuf_misses = 0
    mntinvalbuf_misses = 0
    vinvalbuf_misses = 0
    allocbuf_buflocked = 0
    ufssync_misses = 0
}

The number of block misses (getblk_misses) divided by the sum of block misses and block hits (getblk_hits) should not be more than 3 percent. If the number of block misses is high, you might want to increase the value of the bufcache attribute. See Section 9.1.3 for information on increasing the value of the bufcache attribute.

9.3.3    Tuning UFS for Performance

Table 9-5 lists UFS tuning guidelines and performance benefits and tradeoffs.

Table 9-5:  UFS Tuning Guidelines

Benefit Guideline Tradeoff
Improve performance

Adjust UFS smoothsync and I/O throttling for asynchronous UFS I/O requests (Section 9.3.3.1)

None
Free CPU cycles and reduce the number of I/O operations

Delay UFS cluster writing (Section 9.3.3.2)

If I/O throttling is not used, might degrade real-time workload performance when buffers are flushed
Reduce the number of disk I/O operations

Increase the number of combine blocks for a cluster (Section 9.3.3.3)

Might require more memory to buffer data
Improve read and write performance

Defragment the file system (Section 9.3.3.4)

Requires down time

9.3.3.1    Adjusting UFS Smooth Sync and I/O Throttling

UFS uses smoothsync and I/O throttling to improve UFS performance and to minimize system stalls resulting from a heavy system I/O load.

Smoothsync allows each dirty page to age for a specified time period before going to disk. This allows more opportunity for frequently modified pages to be found in the cache, thus decreasing the I/O load. Also, spikes in which large numbers of dirty pages are locked on the device queue are minimized because pages are enqueued to a device after having aged sufficiently, as opposed to getting flushed by the update daemon.

I/O throttling further addresses the concern of locking dirty pages on the device queue. It enforces a limit on the number of delayed I/O requests allowed to be on the device queue at any point in time. This allows the system to be more responsive to any synchronous requests added to the device queue, such as a read or the loading of a new program into memory. This can also decrease the amount and duration of process stalls for specific dirty buffers, as pages remain available until placed on the device queue.

Related Attributes

The vfs subsystem attributes that affect smoothsync and throttling are:

You can modify the smoothsync_age attribute, the io_throttle_static attribute, and the io_throttle_maxmzthruput attribute without rebooting the system.

9.3.3.2    Delaying UFS Cluster Writing

By default, clusters of UFS pages are written asynchronously. You can configure clusters of UFS pages to be written delayed as other modified data and metadata pages are written.

Related Attribute

The delay_wbuffers attribute specifies whether or not clusters of UFS pages are written asynchronously or delayed.

Value: 0 or 1

Default value: 0 (asynchronously)

If the percentage of UBC dirty pages reaches the value of the delay_wbuffers_percent attribute, the clusters will be written asynchronously, regardless of the value of the delay_wbuffers attribute.

When to Tune

Delay writing clusters of UFS pages if your applications frequently write to previously written pages. This can result in a decrease in the total number of I/O requests. However, if you are not using I/O throttling, it might adversely affect real-time workload performance because the system will experience a heavy I/O load at sync time.

To delay writing clusters of UFS pages, use the dbx patch command to set the value of the delay_wbuffers kernel variable to 1 (enabled).

See Section 3.6.7 for information about using dbx.

9.3.3.3    Increasing the Number of Blocks in a Cluster

UFS combines contiguous blocks into clusters to decrease I/O operations. You can specify the number of blocks in a cluster.

Related Attribute

The cluster_maxcontig attribute specifies the number of blocks that are combined into a single I/O operation.

Default value: 32 blocks

If the specific filesystem's rotational delay value is 0 (default), then UFS attempts to create clusters with up to n blocks, where n is either the value of the cluster_maxcontig attribute or the value from device geometry, whichever is smaller.

If the specific filesystem's rotational delay value is non-zero, then n is the value of the cluster_maxcontig attribute, the value from device geometry, or the value of the maxcontig file system attribute, whichever is smaller.

When to Tune

Increase the number of blocks combined for a cluster if your applications can use a large cluster size.

You can use the newfs command to set the filesystem rotational delay value and the value of the maxcontig attribute. You can use the dbx command to set the value of the cluster_maxcontig attribute.

9.3.3.4    Defragmenting a File System

When a file consists of noncontiguous file extents, the file is considered fragmented. A very fragmented file decreases UFS read and write performance, because it requires more I/O operations to access the file.

When to Perform

Defragmenting a UFS file system improves file system performance. However, it is a time-consuming process.

You can determine whether the files in a file system are fragmented by determining how effectively the system is clustering. You can do this by using the dbx print command to examine the ufs_clusterstats data structure. See Section 9.3.2.2 for information.

UFS block clustering is usually efficient. If the numbers from the UFS clustering kernel structures show that clustering is not effective, the files in the file system may be very fragmented.

Recommended Procedure

To defragment a UFS file system, follow these steps:

  1. Back up the file system onto tape or another partition.

  2. Create a new file system either on the same partition or a different partition.

  3. Restore the file system.

See the System Administration manual for information about backing up and restoring data and creating UFS file systems.

9.4    Tuning NFS

The Network File System (NFS) shares the Unified Buffer Cache (UBC) with the virtual memory subsystem and local file systems. NFS can put an extreme load on the network. Poor NFS performance is almost always a problem with the network infrastructure. Look for high counts of retransmitted messages on the NFS clients, network I/O errors, and routers that cannot maintain the load.

Lost packets on the network can severely degrade NFS performance. Lost packets can be caused by a congested server, the corruption of packets during transmission (which can be caused by bad electrical connections, noisy environments, or noisy Ethernet interfaces), and routers that abandon forwarding attempts too quickly.

You can monitor NFS by using the nfsstat and other commands. When evaluating NFS performance, remember that NFS does not perform well if any file-locking mechanisms are in use on an NFS file. The locks prevent the file from being cached on the client. See nfsstat(8) for more information.

The following sections describe how to display NFS information and attributes that you might be able to tune to improve NFS performance.

9.4.1    Displaying NFS Information

Table 9-6 describes the commands you can use to display NFS information.

Table 9-6:  Commands to Display NFS Information

To Display Command

Network and NFS statistics (Section 9.4.1.1)

nfsstat

Information about idle threads (Section 9.4.1.2)

ps axlmp

All incoming network traffic to an NFS server

nfswatch

Active NFS server threads (Section 3.6.7)

(dbx) print nfs_sv_active_hist

Metadata buffer cache statistics (Section 9.3.2.3)

(dbx) print bio_stats

9.4.1.1    Displaying Network and NFS Statistics

To display or reinitialize NFS and Remote Procedure Call (RPC) statistics for clients and servers, including the number of packets that had to be retransmitted (retrans) and the number of times a reply transaction ID did not match the request transaction ID (badxid), enter:

# /usr/ucb/nfsstat

Information similar to the following is displayed:

Server rpc:
calls     badcalls  nullrecv   badlen   xdrcall
38903     0         0          0        0
 
Server nfs:
calls     badcalls
38903     0
 
Server nfs V2:
null      getattr   setattr    root     lookup     readlink   read
5  0%     3345  8%  61  0%     0  0%    5902 15%   250  0%    1497  3%
wrcache   write     create     remove   rename     link       symlink
0  0%     1400  3%  549  1%    1049  2% 352  0%    250  0%    250  0%
mkdir     rmdir     readdir    statfs
171  0%   172  0%   689  1%    1751  4%
 
Server nfs V3:
null      getattr   setattr    lookup    access    readlink   read
0  0%     1333  3%  1019  2%   5196 13%  238  0%   400  1%    2816  7%
write     create    mkdir      symlink   mknod     remove     rmdir
2560  6%  752  1%   140  0%    400  1%   0  0%     1352  3%   140  0%
rename    link      readdir    readdir+  fsstat    fsinfo     pathconf
200  0%   200  0%   936  2%    0  0%     3504  9%  3  0%      0  0%
commit
21  0%
 
Client rpc:
calls     badcalls  retrans    badxid    timeout   wait       newcred
27989     1         0          0         1         0          0
badverfs  timers
0         4
 
Client nfs:
calls     badcalls  nclget     nclsleep
27988     0         27988      0
 
Client nfs V2:
null      getattr   setattr    root      lookup    readlink   read
0  0%     3414 12%  61  0%     0  0%     5973 21%  257  0%    1503  5%
wrcache   write     create     remove    rename    link       symlink
0  0%     1400  5%  549  1%    1049  3%  352  1%   250  0%    250  0%
mkdir     rmdir     readdir    statfs
171  0%   171  0%   713  2%    1756  6%
 
Client nfs V3:
null      getattr   setattr    lookup    access    readlink   read
0  0%     666  2%   9  0%      2598  9%  137  0%   200  0%    1408  5%
write     create    mkdir      symlink   mknod     remove     rmdir
1280  4%  376  1%   70  0%     200  0%   0  0%     676  2%    70  0%
rename    link      readdir    readdir+  fsstat    fsinfo     pathconf
100  0%   100  0%   468  1%    0  0%     1750  6%  1  0%      0  0%
commit
10  0%
 
 
 

The ratio of timeouts to calls (which should not exceed 1 percent) is the most important thing to look for in the NFS statistics. A timeout-to-call ratio greater than 1 percent can have a significant negative impact on performance. See Chapter 10 for information on how to tune your system to avoid timeouts.

To display NFS and RPC information in intervals (seconds), enter:

# /usr/ucb/nfsstat -s -i number

The following example displays NFS and RPC information in 10-second intervals:

# /usr/ucb/nfsstat -s -i 10

If you are monitoring an experimental situation with nfsstat, reset the NFS counters to 0 before you begin the experiment. To reset counters to 0, enter:

# /usr/ucb/nfsstat -z

See nfsstat(8) for more information about command options and output.

9.4.1.2    Displaying Idle Thread Information

On a client system, the nfsiod daemon spawns several I/O threads to service asynchronous I/O requests to the server. The I/O threads improve the performance of both NFS reads and writes. The optimum number of I/O threads depends on many variables, such as how quickly the client will be writing, how many files will be accessed simultaneously, and the characteristics of the NFS server. For most clients, seven threads are sufficient.

To display idle I/O threads on a client system, enter:

# /usr/ucb/ps axlmp 0 | grep nfs

Information similar to the following is displayed:

 0  42   0            nfsiod_  S                 0:00.52                 
 0  42   0            nfsiod_  S                 0:01.18                 
 0  42   0            nfsiod_  S                 0:00.36                 
 0  44   0            nfsiod_  S                 0:00.87                 
 0  42   0            nfsiod_  S                 0:00.52                 
 0  42   0            nfsiod_  S                 0:00.45                 
 0  42   0            nfsiod_  S                 0:00.74                 
 
# 

The previous example shows a sufficient number of sleeping threads and 42 server threads that were started by nfsd, where nfsiod_ was replaced by nfs_tcp or nfs_udp.

If your output shows that few threads are sleeping, you might improve NFS performance by increasing the number of threads. See Section 9.4.2.1, Section 9.4.2.2, nfsiod(8), and nfsd(8) for more information.

9.4.2    Improving NFS Performance

Improving performance on a system that is used only for serving NFS differs from tuning a system that is used for general timesharing, because an NFS server runs only a few small user-level programs, which consume few system resources. There is minimal paging and swapping activity, so memory resources should be focused on caching file system data.

File system tuning is important for NFS because processing NFS requests consumes the majority of CPU and wall clock time. Ideally, the UBC hit rate should be high. Increasing the UBC hit rate can require additional memory or a reduction in the size of other file system caches. In general, file system tuning will improve the performance of I/O-intensive user applications.

In addition, a vnode must exist to keep file data. If you are using AdvFS, an access structure is also required to keep file data.

If you are running NFS over TCP, tuning TCP may improve performance if there are many active clients. See Section 10.2 for more information. However, if you are running NFS over UDP, no network tuning is needed.

Table 9-7 lists NFS configuration guidelines and performance benefits and tradeoffs.

Table 9-7:  NFS Tuning Guidelines

Benefit Guideline Tradeoff
Enable efficient I/O blocking operations

Configure the appropriate number of threads on an NFS server (Section 9.4.2.1)

None
Enable efficient I/O blocking operations

Configure the appropriate number of threads on the client system (Section 9.4.2.2)

None
Improve performance on slow or congested networks

Decrease network timeouts on the client system (Section 9.4.2.4)

Reduces the theoretical performance
Improve network performance for read-only file systems and enable clients to quickly detect changes

Modify cache timeout limits on the client system (Section 9.4.2.3)

Increases network traffic to server

9.4.2.1    Configuring Server Threads

The nfsd daemon runs on NFS servers to service NFS requests from client systems. The daemon spawns a number of server threads that process NFS requests from client systems. At least one server thread must be running for a machine to operate as a server. The number of threads determines the number of parallel operations and must be a multiple of 8.

To improve performance on frequently used NFS servers, configure either 16 or 32 threads, which provides the most efficient blocking for I/O operations. See nfsd(8) for more information.

9.4.2.2    Configuring Client Threads

Client systems use the nfsiod daemon to service asynchronous I/O operations, such as buffer cache read-ahead and delayed write operations. The nfsiod daemon spawns several I/O threads to service asynchronous I/O requests to its server. The I/O threads improve performance of both NFS reads and writes.

The optimal number of I/O threads to run depends on many variables, such as how quickly the client is writing data, how many files will be accessed simultaneously, and the behavior of the NFS server. The number of threads must be a multiple of 8 minus 1 (for example, 7 or 15 is optimal).

NFS servers attempt to gather writes into complete UFS clusters before initiating I/O, and the number of threads (plus 1) is the number of writes that a client can have outstanding at any one time. Having exactly 7 or 15 threads produces the most efficient blocking for I/O operations. If write gathering is enabled, and the client does not have any threads, you may experience a performance degradation. To disable write gathering, use the dbx patch command to set the nfs_write_gather kernel variable to zero. See Section 3.6.7for information.

Use the ps axlmp 0 | grep nfs command to display idle I/O threads on the client. If few threads are sleeping, you might improve NFS performance by increasing the number of threads. See nfsiod(8) for more information.

9.4.2.3    Modifying Cache Timeout Limits

For read-only file systems and slow network links, performance might improve by changing the cache timeout limits on NFS client systems. These timeouts affect how quickly you see updates to a file or directory that was modified by another host. If you are not sharing files with users on other hosts, including the server system, increasing these values will slightly improve performance and will reduce the amount of network traffic that you generate.

See mount(8) and the descriptions of the acregmin, acregmax, acdirmin, acdirmax, actimeo options for more information.

9.4.2.4    Decreasing Network Timeouts

NFS does not perform well if it is used over slow network links, congested networks, or wide area networks (WANs). In particular, network timeouts on client systems can severely degrade NFS performance. This condition can be identified by using the nfsstat command and determining the ratio of timeouts to calls. If timeouts are more than 1 percent of the total calls, NFS performance may be severely degraded. See Section 9.4.1.1 for sample nfsstat output of timeout and call statistics.

You can also use the netstat -s command to verify the existence of a timeout problem. A nonzero value in the fragments dropped after timeout field in the ip section of the netstat output may indicate that the problem exists. See Section 10.1.1 for sample netstat command output.

If fragment drops are a problem on a client system, use the mount command with the -rsize=1024 and -wsize=1024 options to set the size of the NFS read and write buffers to 1 KB.