8 Managing Disk Storage Performance

There are various ways that you can manage your disk storage. Depending on your performance and availability needs, you can use static disk partitions, the Logical Storage Manager (LSM), hardware RAID, or a combination of these solutions.

The disk storage configuration can have a significant impact on system performance, because disk I/O is used for file system operations and also by the virtual memory subsystem for paging and swapping.

You may be able to improve disk I/O performance by following the configuration and tuning guidelines described in this chapter, which describes how to perform the following tasks:

Improve performance by efficiently distributing the disk I/O load (Section 8.1)

Obtain information about the disk storage configuration and performance (Section 8.2)

Manage LSM performance (Section 8.3)

Improve hardware RAID subsystem performance (Section 8.4)

Tune the Common Access Method (CAM) (Section 8.5)

To configure a disk storage subsystem that will meet your performance and availability needs, you must first understand your workload resource model, as described in Section 2.1.

8.1 Distributing the Disk I/O Load

Distributing the disk I/O load across devices helps to prevent a single disk, controller, or bus from becoming a bottleneck and also allows simultaneous I/O operations.

For example, if you have 16 GB of disk storage, you may get better performance from sixteen 1-GB disks rather than four 4-GB disks. More spindles (disks) may allow more simultaneous operations. For random I/O operations, 16 disks may be simultaneously seeking instead of 4 disks. For large sequential data transfers, 16 data streams can be simultaneously working instead of 4 data streams.

RAID 0 (disk striping) enables you to efficiently distribute disk data across the disks in a stripe set. See Section 8.3.3 and Section 8.4 for more information.

If you are using file systems, place the most frequently used file systems on different disks and optimally on different buses. Directories containing executable files or temporary files are often frequently accessed (for example, /var, /usr, and /tmp). If possible, place /usr and /tmp on different disks.

Guidelines for distributing disk I/O also apply to swap devices. See Section 6.2 for more information about configuring swap devices for high performance.

8.2 Gathering Basic Disk Information

Table 8-1 describes the tools you can use to obtain information about basic disk activity and usage.

Table 8-1: Basic Disk Monitoring Tools

Name	Use	Description
`sys_check`	Analyzes system configuration and displays statistics (Section 4.2)	Creates an HTML file that describes the system configuration, and can be used to diagnose problems. The `sys_check` utility checks kernel variable settings and memory and CPU resources, and provides performance data and lock statistics for SMP systems and kernel profiles. The `sys_check` utility calls other commands and utilities to perform a basic analysis of your configuration and kernel variable settings and provides warnings and tuning recommendations if necessary. See `sys_check`(8) for more information.
`iostat`	Displays disk and CPU usage (Section 8.2.1)	Displays transfer statistics for each disk, and the percentage of time the system has spent in user mode, in user mode running low priority (`nice`) processes, in system mode, and in idle mode.
`(dbx) print nchstats`	Reports namei cache statistics (Section 9.1.2)	Reports namei cache statistics, including hit rates.
`(dbx) print` `xpt_qhead`, `ccmn_bp_head`, and `xpt_cb_queue`	Reports Common Access Method (CAM) statistics (Section 8.2.2)	Reports CAM statistics, including information about buffers and completed I/O operations.
`diskx`	Tests disk driver functionality	Reads and writes data to disk partitions. The `diskx` exerciser analyzes data transfer performance, verifies the `disktab` database file entry, and tests reads, writes, and seeks. The `diskx` exerciser can destroy the contents of a partition. See `diskx`(8) for more information.

8.2.1 Displaying Disk Usage with the iostat Command

The iostat command reports I/O statistics for terminals, disks, and the CPU that you can use to diagnose disk I/O performance problems. The first line of the output is the average since boot time, and each subsequent report is for the last interval. You can also specify a disk name in the command line to output information only about that disk.

An example of the iostat command is as follows; output is provided in one-second intervals:


# /usr/ucb/iostat 1
      tty     fd0      rz0      rz1      dk3     cpu
 tin tout bps tps  bps tps  bps tps  bps tps  us ni sy id
   1   73   0   0   23   2   37   3    0   0   5  0 17 79
   0   58   0   0   47   5  204  25    0   0   8  0 14 77
   0   58   0   0    8   1   62   1    0   0  27  0 27 46
   0   58   0   0    8   1    0   0    0   0  22  0 31 46

The iostat command output displays the following information:

For each disk, (rzn), the number of kilobytes transferred per second (bps) and the number of transfers per second (tps).

For the system (cpu), the percentage of time the CPU has spent in user state running processes either at their default priority or preferred priority (us), in user mode running processes at a less favored priority (ni), in system mode (sy), and in idle mode (id). This information enables you to determine how disk I/O is affecting the CPU. User mode includes the time the CPU spent executing library routines. System mode includes the time the CPU spent executing system calls.

The iostat command can help you to do the following:

Determine which disk is being used the most and which is being used the least. This information will help you determine how to distribute your file systems and swap space. Use the swapon -s command to determine which disks are used for swap space.

If the iostat command output shows a lot of disk activity and a high system idle time, the system may be disk bound. You may need to balance the disk I/O load, defragment disks, or upgrade your hardware.

If a disk is doing a large number of transfers (the tps field) but reading and writing only small amounts of data (the bps field), examine how your applications are doing disk I/O. The application may be performing a large number of I/O operations to handle only a small amount of data. You may want to rewrite the application if this behavior is not necessary.

8.2.2 Monitoring CAM by Using the dbx Debugger

The operating system uses the Common Access Method (CAM) as the operating system interface to the hardware. CAM maintains the following dbx data structures:

xpt_qhead--Contains information about the current size of the buffer pool free list (xpt_nfree), the current number of processes waiting for buffers (xpt_wait_cnt), and the total number of times that processes had to wait for free buffers (xpt_times_wait).
For example:
```
# /usr/ucb/dbx -k /vmunix /dev/mem 
(dbx) print xpt_qhead
struct {
    xws = struct {
        x_flink = 0xffffffff81f07400
        x_blink = 0xffffffff81f03000
        xpt_flags = 2147483656
        xpt_ccb = (nil)
        xpt_nfree = 300
        xpt_nbusy = 0
    }
    xpt_wait_cnt = 0
    xpt_times_wait = 2
    xpt_ccb_limit = 1048576
    xpt_ccbs_total = 300
    x_lk_qhead = struct {
        sl_data = 0
        sl_info = 0
        sl_cpuid = 0
        sl_lifms = 0
    }
}
(dbx)
```
If the value for the xpt_wait_cnt field is not zero, CAM has run out of buffer pool space. If this situation persists, you may be able to eliminate the problem by changing one or more of CAM's I/O attributes (see Section 8.5).

ccmn_bp_head--Provides statistics on the buffer structure pool. This pool is used for raw I/O to disk. The information provided is the current size of the buffer structure pool (num_bp) and the wait count for buffers (bp_wait_cnt).
For example:
```
# /usr/ucb/dbx -k /vmunix /dev/mem 
(dbx) print ccmn_bp_head
struct {
    num_bp = 50
    bp_list = 0xffffffff81f1be00
    bp_wait_cnt = 0
}
(dbx)
```
If the value for the bp_wait_cnt field is not zero, CAM has run out of buffer pool space. If this situation persists, you may be able to eliminate the problem by changing one or more of the CAM subsystem attributes (see Section 8.5).

xpt_cb_queue--Contains the actual link list of the I/O operations that have been completed and are waiting to be passed back to the peripheral drivers (cam_disk or cam_tape).
For example:
```
# /usr/ucb/dbx -k /vmunix /dev/mem 
(dbx) print xpt_cb_queue
struct {
    flink = 0xfffffc00004d6828
    blink = 0xfffffc00004d6828
    flags = 0
    initialized = 1
    count = 0
    cplt_lock = struct {
        sl_data = 0
        sl_info = 0
        sl_cpuid = 0
        sl_lifms = 0
    }
}
(dbx)
```
The count field specifies the number of I/O operations that have been completed and are ready to be passed back to a peripheral device driver. Normally, this value is 0 or 1. If the value of count is temporarily greater than 1, it may indicate that a large number of I/O operations are completing simultaneously. If the value is consistently greater than 1, it may indicate a problem.

8.3 Managing Logical Storage Manager Performance

The Logical Storage Manager (LSM) can improve system performance and provide high data availability with little additional overhead. LSM also provides you with online storage management features and enhanced performance information and statistics. Although any type of system can benefit from LSM, it is especially suited for configurations with large numbers of disks or configurations that regularly add storage.

LSM allows you to set up a shared storage pool that consists of multiple disks. You can create virtual disks (LSM volumes) from this pool of storage, according to your performance and capacity needs. LSM volumes are used in the same way as disk partitions. You can create UFS file systems and AdvFS file domains and filesets on an LSM volume, use a volume as a raw device, or create LSM volumes on top of RAID storage sets. You can also use LSM on swap disks.

LSM provides you with flexible and easy management for large storage configurations. Because there is no direct correlation between a virtual disk and a physical disk, file system or raw I/O can span disks, as needed. In addition, you can easily add disks to and remove disks from the pool, balance the load, and perform other storage management tasks.

LSM provides more cost-effective RAID functionality than a hardware RAID subsystem. When LSM is used to stripe or mirror disks, it is sometimes referred to as software RAID. LSM configurations are less complex than hardware RAID.

LSM supports the following basic disk management features:

Pool of storage

Load balancing by transparently moving data across disks

Disk concatenation (creating a large volume from multiple disks)

Detailed LSM performance information from the volstat command

The following advanced LSM disk management features require a license:

RAID 0 (disk striping)

RAID 1 (disk mirroring)

Block-change logging (BCL), which improves the mirrored volume recovery rate

Striped swap disks

Graphical user interface (GUI) for easy disk management and detailed performance information

To obtain the best LSM performance, you must follow the configuration and tuning guidelines. The following sections contain information about:

Guidelines for disks, disk groups, and databases (Section 8.3.1)

Guidelines for mirrored disks (Section 8.3.2)

Guidelines for striped disks (Section 8.3.3)

Monitoring the LSM configuration and performance (Section 8.3.4)

Improving LSM performance (Section 8.3.5)

See the Logical Storage Manager manual for detailed information about using LSM.

8.3.1 Basic LSM Configuration Recommendations

There are general recommendations that you can use to configure LSM disks, disk groups, and databases for high performance. Each LSM disk group maintains a configuration database, which includes detailed information about mirrored and striped disks and volume, plex, and subdisk records. How you configure your LSM disks, disk groups, and databases determines the flexibility and performance of your LSM configuration.

Table 8-2 describes the LSM disk, disk group, and database configuration recommendations and lists performance benefits as well as tradeoffs.

Table 8-2: LSM Disk, Disk Group, and Database Configuration Guidelines

Recommendation	Benefit	Tradeoff
Initialize your LSM disks as sliced disks (Section 8.3.1.1)	Provides greater storage configuration flexibility	None
Make the `rootdg` disk group a sufficient size (Section 8.3.1.2)	Ensures sufficient space for disk group information	None
Use a sufficient private region size for each disk in a disk group (Section 8.3.1.3)	Ensures sufficient space for database copies	Large private regions require more disk space
Make the private regions in a disk group the same size (Section 8.3.1.4)	Efficiently utilizes the configuration space	None
Organize disks into different disk groups according to function (Section 8.3.1.5)	Allows you to move disk groups between systems	Reduces flexibility when configuring volumes
Use an appropriate size and number of database and log copies (Section 8.3.1.6)	Ensures database availability and improves performance	None
Place disks containing database and log copies on different buses (Section 8.3.1.7)	Improves availability	Cost of additional hardware

The following sections describe the previous recommendations in detail.

8.3.1.1 Initializing LSM Disks as Sliced Disks

Initialize your LSM disks as sliced disks, instead of as simple disks. A sliced disk provides greater storage configuration flexibility because the entire disk is under LSM control. The disk label for a sliced disk contains information that identifies the partitions containing the private and the public regions. In contrast, simple disks have both public and private regions in the same partition.

8.3.1.2 Sizing the rootdg Disk Group

You must make sure that the rootdg disk group has an adequate size, because the disk group's configuration database contains records for disks outside of the rootdg disk group, in addition to the ordinary disk-group configuration information. For example, the rootdg configuration database includes disk-access records that define all disks under LSM control.

The rootdg disk group must be large enough to contain records for the disks in all the disk groups. See Table 8-3 for more information.

8.3.1.3 Sizing Private Regions

You must make sure that the private region for each disk has an adequate size. LSM keeps disk media label and configuration database copies in each disk's private region.

A private region must be large enough to accommodate the size of the LSM database copies. In addition, the maximum number of LSM objects (disks, subdisks, volumes, and plexes) in a disk group depends on an adequate private region size. However, a large private region requires more disk space. The default private region size is 1024 blocks, which is usually adequate for configurations using up to 128 disks per disk group.

8.3.1.4 Making Private Regions in a Disk Group the Same Size

The private region of each disk in a disk group should be the same size to efficiently utilize the configuration space. One or two LSM configuration database copies can be stored in a disk's private region.

When you add a new disk to existing an LSM disk group, the size of the private region on the new disk is determined by the private region size of the other disks in the disk group. As you add more disks to a disk group, the voldiskadd utility reduces the number of configuration copies and log copies that are initialized for the new disks. See voldiskadd(8) for more information.

8.3.1.5 Organizing Disks in Disk Groups

You may want to organize disks in disk groups according to their function. This enables disk groups to be moved between systems, and decreases the size of the LSM configuration database for each disk group. However, using multiple disk groups reduces flexibility when configuring volumes.

8.3.1.6 Choosing the Correct Number and Size of the Database and Log Copies

Each disk group maintains a configuration database, which includes detailed information about mirrored and striped disks and volume, plex, and subdisk records. The LSM subsystem's overhead primarily involves managing the kernel change logs and copies of the configuration databases.

LSM performance is affected by the size and the number of copies of the configuration database and the kernel change log. They determine the amount of time it takes for LSM to start up, for changes to the configuration to occur, and for the LSM disks to fail over in a cluster.

Usually, each disk in a disk group contains one or two copies of both the kernel change log and the configuration database. Disk groups consisting of more than eight disks should not have copies on all disks. Always use four to eight copies.

The number of kernel change log copies must be the same as the number of configuration database copies. For the best performance, the number of copies must be the same on each disk that contains copies.

Table 8-3 describes the guidelines for configuration database and kernel change log copies.

Table 8-3: LSM Database and Kernel Change Log Guidelines

Disks Per Disk Group	Size of Private Region (in Blocks)	Configuration and Kernel Change Log Copies Per Disk
1 to 3	512	Two copies in each private region
4 to 8	512	One copy in each private region
9 to 32	512	One copy on four to eight disks, zero copies on remaining disks
33 to 128	1024	One copy on four to eight disks, zero copies on remaining disks
129 to 256	1536	One copy on four to eight disks, zero copies on remaining disks
257 or more	2048	One copy on four to eight disks, zero copies on remaining disks

8.3.1.7 Distributing the Database and Logs Across Different Buses

For disk groups with large numbers of disks, place the disks that contain configuration database and kernel change log copies on different buses. This provides you with better performance and higher availability.

8.3.2 LSM Mirrored Volume Configuration Recommendations

Use LSM mirrored volumes (RAID 1) for high data availability. A plex is a set of data. To mirror a volume, you must set up at least two plexes. If a physical disk fails, the plex containing the failed disk becomes temporarily unavailable, but the remaining plexes are still available.

Mirroring can also improve read performance. However, a write to a mirrored volume results in parallel writes to each plex, so mirroring will degrade disk write performance. Environments whose disk I/O operations are predominantly reads obtain the best performance results from mirroring. See Table 8-4 for mirrored volume guidelines.

Use block-change logging (BCL) to improve the mirrored volume recovery rate when a system failure occurs by reducing the synchronization time. If BCL is enabled and a write is made to a mirrored plex, BCL identifies the block numbers that have changed and then stores the numbers on a logging subdisk. BCL is not used for reads.

BCL is enabled if two or more plexes in a mirrored volume have a logging subdisk associated with them. Only one logging subdisk can be associated with a plex.

BCL can add some overhead to your system and degrade the mirrored volume's write performance. However, the impact is less for systems under a heavy I/O load, because multiple writes to the log are batched into a single write. See Table 8-5 for BCL configuration guidelines.

Note

BCL will be replaced by dirty region logging (DRL) in a future release.

You may want to combine mirroring (RAID 1) with striping (RAID 0) to provide high availability and balance the disk I/O load. See Section 8.3.3 for striping guidelines.

Table 8-4 describes LSM mirrored volume configuration recommendations and lists performance benefits as well as tradeoffs.

Table 8-4: LSM Mirrored Volume Guidelines

Recommendation	Benefit	Tradeoff
Map mirrored plexes across different buses (Section 8.3.2.1)	Improves performance and increases availability	None
Use the appropriate read policy (Section 8.3.2.2)	Efficiently distributes reads	None
Attach up to eight plexes to the same volume (Section 8.3.2.3)	Improves performance for read-intensive workloads and increases availability	Uses disk space inefficiently
Use a symmetrical configuration (Section 8.3.2.4)	Provides more predictable performance	None
Use block-change logging (BCL) (Table 8-5)	Improves mirrored volume recovery rate	May decrease write performance
Stripe the mirrored volumes (Table 8-6)	Improves disk I/O performance and balances I/O load	Increases management complexity

Table 8-5 describes LSM block-change logging (BCL) configuration recommendations and lists performance benefits as well as tradeoffs.

Table 8-5: LSM Block-Change Logging Guidelines

Recommendation	Benefit	Tradeoff
Configure multiple logging subdisks (Section 8.3.2.5)	Improves recovery time	Requires additional disks
Use a write-back cache for logging subdisks (Section 8.3.2.6)	Minimizes BCLs write degradation	Cost of hardware RAID subsystem
Use the appropriate BCL subdisk size (Section 8.3.2.7)	Enables migration to dirty region logging	None
Place logging subdisks on infrequently used disks (Section 8.3.2.8)	Helps to prevent disk bottlenecks	None
Use solid-state disks for logging subdisks (Section 8.3.2.9)	Minimizes BCL's write degradation	Cost of solid-state disks

The following sections describe the previous LSM mirrored volume and BCL recommendations in detail.

8.3.2.1 Mirroring Volumes Across Different Buses

Putting each mirrored plex on a different bus improves performance by enabling simultaneous I/O operations. Mirroring across different buses also increases availability by protecting against bus and adapter failure.

8.3.2.2 Choosing a Read Policy for a Mirrored Volume

To provide optimal performance for different types of mirrored volumes, LSM supports the following read policies:

Round-robin read
Satisfies read operations to the volume in a round-robin manner from all plexes in the volume.

Preferred read
Satisfies read operations from one specific plex (usually the plex with the highest performance).

Select
Selects a default read policy based on the plex associations to the volume. If the mirrored volume contains a single, enabled, striped plex, the default is to prefer that plex. For any other set of plex associations, the default is to use a round-robin policy.

If one plex exhibits superior performance, either because the plex is striped across multiple disks or because it is located on a much faster device, then set the read policy to preferred read for that plex. By default, a mirrored volume with one striped plex should have the striped plex configured as the preferred read. Otherwise, you should use the round-robin read policy.

8.3.2.3 Using Multiple Plexes in a Mirrored Volume

To improve performance for read-intensive workloads, up to eight plexes can be attached to the same mirrored volume. However, this configuration does not use disk space efficiently.

8.3.2.4 Using a Symmetrical Configuration

A symmetrical mirrored disk configuration provides predictable performance and easy management. Use the same number of disks in each mirrored plex. For mirrored striped volumes, you can stripe across half of the available disks to form one plex and across the other half to form the other plex.

8.3.2.5 Using Multiple BCL Subdisks

Using multiple block-change logging (BCL) subdisks will improve recovery time after a failure.

8.3.2.6 Using a Write-Back Cache with LSM

To minimize BCL's impact on write performance, use LSM in conjunction with a RAID subsystem that has a write-back cache. Typically, the BCL performance degradation is more significant on systems with few writes than on systems with heavy write loads.

8.3.2.7 Sizing BCL Subdisks

To support migration from BCL to dirty region logging (DRL), which will be supported in a future release, use the appropriate BCL subdisk size.

If you have less than 64 GB of disk space under LSM control, calculate the subdisk size by multiplying 1 block by each gigabyte of storage. If the result is an odd number, add 1 block; if the result is an even number, add 2 blocks.

For example, if you have 1 GB (or less) of space, use a 2-block subdisk. If you have 2 GB (or 3 GB) of space, use a 4-block subdisk.

If you have more than 64 GB of disk space under LSM control, use a 64-block subdisk.

8.3.2.8 Placing BCL Logging Subdisks on Infrequently Used Disks

Place logging subdisks on infrequently used disks. Because these subdisks are frequently written, do not put them on busy disks. In addition, do not configure BCL subdisks on the same disks as the volume data, because this will cause head seeking or thrashing.

8.3.2.9 Using Solid-State Disks for BCL Subdisks

If persistent (nonvolatile) solid-state disks are available, use them for logging subdisks.

8.3.3 LSM Striped Volume Configuration Recommendations

Striping volumes (RAID 0) with LSM enables parallel I/O streams to operate concurrently on separate devices, which distributes the disk I/O load and improves performance. Striping is especially effective for applications that perform large sequential data transfers or multiple, simultaneous I/O operations.

Striping distributes data in fixed-size portions (stripes) across the disks in a volume. The stripes are interleaved across the striped plex's subdisks, which are located on different disks to evenly distribute the disk I/O.

The performance benefit of striping depends on the stripe width, which is the number of blocks in a stripe, and how your users and applications perform I/O. Bandwidth increases with the number of disks across which a plex is striped.

You can combine mirroring (RAID 1) with striping to obtain high availability. However, mirroring will decrease write performance. See Section 8.3.2 for mirroring guidelines.

Table 8-6 describes the LSM striped volume configuration recommendations and lists performance benefits as well as tradeoffs.

Table 8-6: LSM Striped Volume Guidelines

Recommendation	Benefit	Tradeoff
Use multiple disks in a striped volume (Section 8.3.3.1)	Improves performance	Decreases volume reliability
Distribute subdisks across different disks and buses (Section 8.3.3.2)	Improves performance and increases availability	None
Use the appropriate stripe width (Section 8.3.3.3)	Improves performance	None
Avoid splitting small data transfers (Section 8.3.3.3)	Improves the performance of volumes that quickly receive multiple data transfers	May use disk space inefficiently
Split large individual data transfers (Section 8.3.3.3)	Improves the performance of volumes that receive large data transfers	Decreases throughput

The following sections describe the previous LSM striped volume configuration recommendations in detail.

8.3.3.1 Increasing the Number of Disks in a Striped Volume

Increasing the number of disks in a striped volume can increase the bandwidth, depending on the applications and file systems you are using and on the number of simultaneous users. However, this reduces the effective mean-time-between-failures (MTBF) of the volume. If this reduction is a problem, use both mirroring and striping. See Section 8.3.2 for mirroring guidelines.

8.3.3.2 Distributing Striped Volume Subdisks Across Different Buses

Distribute the subdisks of a striped volume across different buses. This improves performance and helps to prevent a single bus from becoming a bottleneck.

8.3.3.3 Choosing the Correct Stripe Width

The performance benefit of striping depends on the size of the stripe width and the characteristics of the I/O load. Stripes of data are allocated alternately and evenly to the subdisks of a striped plex. A striped plex consists of a number of equal-sized subdisks located on different disks.

The number of blocks in a stripe determines the stripe width. LSM uses a default stripe width of 64 KB (or 128 sectors), which works well in most environments.

Use the volstat command to determine the number of data transfer splits. For volumes that receive only small I/O transfers, you may not want to use striping because disk access time is important. Striping is most beneficial for large data transfers.

To improve performance of large sequential data transfers, use a stripe width that will divide each individual data transfer and distribute the blocks equally across the disks.

To improve the performance of multiple simultaneous small data transfers, make the stripe width the same size as the data transfer. However, an excessively small stripe width can result in poor system performance.

If you are striping mirrored volumes, ensure that the stripe width is the same for each plex.

8.3.4 Gathering LSM Information

Table 8-7 describes the tools you can use to obtain information about the Logical Storage Manager (LSM).

Table 8-7: LSM Monitoring Tools

Name	Use	Description
`volprint`	Displays LSM configuration information (Section 8.3.4.1)	Displays information about LSM disk groups, disk media, volumes, plexes, and subdisk records. It does not display disk access records. See `volprint`(8) for more information.
`volstat`	Monitors LSM performance statistics (Section 8.3.4.2)	Displays performance statistics since boot time for all LSM objects (volumes, plexes, subdisks, and disks). These statistics include information about read and write operations, including the total number of operations, the number of failed operations, the number of blocks read or written, and the average time spent on the operation in a specified interval of time. The `volstat` utility also can reset the I/O statistics. See `volstat`(8) for more information.
`voltrace`	Tracks LSM operations (Section 8.3.4.3)	Sets I/O tracing masks against one or all volumes in the LSM configuration and logs the results to the LSM default event log, `/dev/volevent`. The utility also formats and displays the tracing mask information and can trace the following ongoing LSM events: requests to logical volumes, requests that LSM passes to the underlying block device drivers, and I/O events, errors, and recoveries. See `voltrace`(8) for more information.
`volwatch`	Monitors LSM events (Section 8.3.4.4)	Monitors LSM for failures in disks, volumes, and plexes, and sends mail if a failure occurs. The `volwatch` script starts automatically when you install LSM. See `volwatch`(8) for more information.
`dxlsm`	Monitors LSM objects (Section 8.3.4.5)	Using the Analyze menu, displays information about LSM disks, volumes, and subdisks. See `dxlsm`(8) for more information.

The following sections describe some of these commands in detail.

8.3.4.1 Displaying LSM Configuration Information by Using the volprint Utility

The volprint utility displays information from records in the LSM configuration database. You can select the records to be displayed by name or by using special search expressions. In addition, you can display record association hierarchies, so that the structure of records is more apparent.

Use the volprint utility to display disk group, disk media, volume, plex, and subdisk records. Invoke the voldisk list command to display disk access records or physical disk information.

The following example uses the volprint utility to show the status of the voldev1 volume:


# /usr/sbin/volprint -ht voldev1
DG NAME        GROUP-ID
DM NAME        DEVICE       TYPE     PRIVLEN  PUBLEN   PUBPATH
V  NAME        USETYPE      KSTATE   STATE    LENGTH   READPOL  PREFPLEX
PL NAME        VOLUME       KSTATE   STATE    LENGTH   LAYOUT   ST-WIDTH MODE
SD NAME        PLEX         PLOFFS   DISKOFFS LENGTH   DISK-NAME    DEVICE
 
v  voldev1     fsgen        ENABLED  ACTIVE   804512   SELECT   -
pl voldev1-01  voldev1      ENABLED  ACTIVE   804512   CONCAT   -        RW
sd rz8-01      voldev1-01   0        0        804512   rz8          rz8
pl voldev1-02  voldev1      ENABLED  ACTIVE   804512   CONCAT   -        RW
sd dev1-01     voldev1-02   0        2295277  402256   dev1         rz9
sd rz15-02     voldev1-02   402256   2295277  402256   rz15         rz15

See volprint(8) for more information.

8.3.4.2 Monitoring LSM Performance Statistics by Using the volstat Utility

The volstat utility provides information about activity on volumes, plexes, subdisks, and disks under LSM control. It reports statistics that reflect the activity levels of LSM objects since boot time.

The amount of information displayed depends on which options you specify to the volstat utility. For example, you can display statistics for a specific LSM object, or you can display statistics for all objects at one time. If you specify a disk group, only statistics for objects in that disk group are displayed. If you do not specify a particular disk group, the volstat utility displays statistics for the default disk group (rootdg).

You can also use the volstat utility to reset the base statistics to zero. This can be done for all objects or for only specified objects. Resetting the statistics to zero before a particular operation makes it possible to measure the subsequent impact of that operation.

The following example uses the volstat utility to display statistics on LSM volumes:

# /usr/sbin/volstat
OPERATIONS       BLOCKS        AVG TIME(ms)
TYP NAME        READ   WRITE    READ    WRITE   READ   WRITE
vol archive      865     807    5722     3809   32.5    24.0
vol home        2980    5287    6504    10550   37.7   221.1
vol local      49477   49230  507892   204975   28.5    33.5
vol src        79174   23603  425472   139302   22.4    30.9
vol swapvol    22751   32364  182001   258905   25.3   323.2

See volstat(8) for more information.

8.3.4.3 Tracking LSM Operations by Using the voltrace Utility

The voltrace utility reads an event log (/dev/volevent) and prints formatted event log records to standard output. Using the voltrace utility, you can set event trace masks to determine the type of events to track. For example, you can trace I/O events, configuration changes, or I/O errors.

The following example uses the voltrace utility to display status on all new events:

# /usr/sbin/voltrace -n -e all
18446744072623507277 IOTRACE 439: req 3987131 v:rootvol p:rootvol-01 \
  d:root_domain s:rz3-02 iot write lb 0 b 63120 len 8192 tm 12
18446744072623507277 IOTRACE 440: req 3987131 \
  v:rootvol iot write lb 0 b 63136 len 8192 tm 12

See voltrace(8) for more information.

8.3.4.4 Monitoring LSM Events by Using the volwatch Script

The volwatch shell script is automatically started when you install LSM. This script sends mail to root if certain LSM configuration events occur, such as a plex detach caused by a disk failure. The script sends mail to root by default. You also can specify another mail recipient.

See volwatch(8) for more information.

8.3.4.5 Monitoring LSM by Using the dxlsm Graphical User Interface

The LSM Visual Administrator, the dxlsm graphical user interface (GUI), allows you to graphically manipulate LSM objects and manage the LSM configuration. The dxlsm GUI also includes an Analyze menu that allows you to display statistics about volumes, LSM disks, and subdisks. The information is graphically displayed, using colors and patterns on the disk icons, and numerically, using the Analysis Statistics form.

You can use the Analysis Parameters form to customize the displayed information.

See the Logical Storage Manager manual and dxlsm(8X) for more information.

8.3.5 Improving LSM Performance

You may be able to improve LSM performance by modifying an LSM subsystem attribute or by performing some administrative tasks. Be sure you have followed the configuration guidelines that are described in Section 8.3.1, Section 8.3.2, and Section 8.3.3.

You can improve LSM performance as follows:

Increase the maximum number of LSM volumes
For large systems, increase the value of the lsm subsystem attribute max-vol, which specifies the maximum number of volumes per system. The default is 1024; you can increase it to 4096. See Section 4.4 for information about modifying kernel subsystem attributes.

Balance the I/O load
LSM allows you to achieve a fine level of granularity in data placement, because LSM provides a way for volumes to be distributed across multiple disks. After measuring actual data-access patterns, you can adjust the placement of file systems.
You can reassign data to specific disks to balance the I/O load among the available storage devices. You can reconfigure volumes on line after performance patterns have been established without adversely impacting volume availability.

Stripe frequently accessed data
If you have frequently accessed file systems or databases, you can realize significant performance benefits by striping the data across multiple disks, which increases bandwidth to this data. See Section 8.3.3 for information.

Set the preferred read policy to the fastest mirrored plex
If one plex of a mirrored volume exhibits superior performance, either because the disk is being striped or concatenated across multiple disks, or because it is located on a much faster device, then set the read policy to the preferred read policy for that plex. By default, a mirrored volume with one striped plex should be configured with the striped plex as the preferred read. See Section 8.3.2.2 for more information.

You may be able to improve the performance of LSM systems that use large amounts of memory or disks by increasing the value of the volinfo.max_io kernel variable. See Section 4.4.6 for information about using dbx to display and modify kernel variables.

8.4 Managing Hardware RAID Subsystem Performance

Hardware RAID subsystems provide RAID functionality for high performance and high availability, relieve the CPU of disk I/O overhead, and enable you to connect many disks to a single I/O bus. There are various types of hardware RAID subsystems with different performance and availability features, but they all include a RAID controller, disks in enclosures, cabling, and disk management software.

RAID storage solutions range from low-cost backplane RAID array controllers to cluster-capable RAID array controllers that provide extensive performance and availability features, such as write-back caches and complete component redundancy.

Hardware RAID subsystems use disk management software, such as the RAID Configuration Utility (RCU) and the StorageWorks Command Console (SWCC) utility, to manage the RAID devices. Menu-driven interfaces allow you to select RAID levels.

Use hardware RAID to combine multiple disks into a single storage set that the system sees as a single unit. A storage set can consist of a simple set of disks, a striped set, a mirrored set, or a RAID set. You can create LSM volumes, AdvFS file domains, or UFS file systems on a storage set, or you can use the storage set as a raw device.

All hardware RAID subsystems provide you with the following features:

A RAID controller that relieves the CPU of the disk I/O overhead

Increased disk storage capacity
Hardware RAID subsystems allow you to connect a large number of disks to a single I/O bus. In a typical storage configuration, you use a SCSI bus connected to an I/O bus slot to attach disks to a system. However, systems have limited I/O bus slots, and you can connect only a limited number of disks to a SCSI bus (eight for SCSI-2 and sixteen for SCSI-3).
Hardware RAID subsystems contain multiple internal SCSI buses and host bus adapters, and require only one I/O bus to connect the subsystem to a system.

Read cache
A read cache improves I/O read performance by holding data that it anticipates the host will request. If a system requests data that is already in the read cache (a cache hit), the data is immediately supplied without having to read the data from disk. Subsequent data modifications are written both to disk and to the read cache (write-through caching).

Write-back cache
Hardware RAID subsystems support (as a standard or an optional feature) a write-back cache, which can improve I/O write performance while maintaining data integrity. A write-back cache decreases the latency of many small writes, and can improve Internet server performance because writes appear to be executed immediately. Applications that perform few writes will not benefit from a write-back cache.
With write-back caching, data intended to be written to disk is temporarily stored in the cache, consolidated, and then periodically written (flushed) to disk for maximum efficiency. I/O latency is reduced by consolidating contiguous data blocks from multiple host writes into a single unit.
A write-back cache must be battery-backed to protect against data loss and corruption.

RAID support
All hardware RAID subsystems support RAID 0 (disk striping), RAID 1 (disk mirroring), and RAID 5. High-performance RAID array subsystems also support RAID 3 and dynamic parity RAID. See Section 1.3.3.1 for information about RAID levels.

Non-RAID disk array capability or "just a bunch of disks" (JBOD)

Component hot swapping and hot sparing
Hot swap support allows you to replace a failed component while the system continues to operate. Hot spare support allows you to automatically use previously installed components if a failure occurs.

Graphical user interface (GUI) for easy management and monitoring

There are various types of hardware RAID subsystems, which provide different degrees of performance and availability at various costs:

Backplane RAID array storage subsystems
These entry-level subsystems, such as those utilizing the RAID Array 230/Plus storage controller, provide a low-cost hardware RAID solution and are designed for small and midsize departments and workgroups.
A backplane RAID array storage controller is installed in an I/O bus slot, either a PCI bus slot or an EISA bus slot, and acts as both a host bus adapter and a RAID controller.
Backplane RAID array subsystems provide RAID functionality (0, 1, 0+1, and 5), an optional write-back cache, and hot swap functionality.

High-performance RAID array subsystems
These subsystems, such as the RAID Array 450 subsystem, provide extensive performance and availability features and are designed for client/server, data center, and medium to large departmental environments.
A high-performance RAID array controller, such as an HSZ50 controller, is connected to a system through a FWD SCSI bus and a high-performance host bus adapter installed in an I/O bus slot.
High-performance RAID array subsystems provide RAID functionality (0, 1, 0+1, 3, 5, and dynamic parity RAID), dual-redundant controller support, scalability, storage set partitioning, multipath concurrent access, a standard battery-backed write-back cache, and hot-swappable components.

Enterprise Storage Arrays (ESA)
These preconfigured high-performance hardware RAID subsystems, such as the RAID Array 7000, provide the highest performance, availability, and disk capacity of any RAID subsystem. They are used for high transaction-oriented applications and high bandwidth decision-support applications.
ESAs support all major RAID levels, including dynamic parity RAID; fully redundant and hot-swappable components; a standard battery-backed write-back cache; and centralized storage management.

See the Compaq Systems & Options Catalog for detailed information about hardware RAID subsystem features.

Table 8-8 describes the hardware RAID subsystem configuration recommendations and lists performance benefits as well as tradeoffs.

Table 8-8: Hardware RAID Subsystem Configuration Guidelines

Recommendation	Performance Benefit	Tradeoff
Evenly distribute disks in a storage set across different buses (Section 8.4.1)	Improves performance and helps to prevent bottlenecks	None
Configure devices for multipath concurrent access	Improves throughput and increases availability	Cost of devices that support this feature
Use disks with the same data capacity in each storage set (Section 8.4.2)	Simplifies storage management	None
Use an appropriate stripe size (Section 8.4.3)	Improves performance	None
Mirror striped sets (Section 8.4.4)	Provides availability and distributes disk I/O performance	Increases configuration complexity and may decrease write performance
Use a write-back cache (Section 8.4.5)	Improves write performance	Cost of hardware
Use dual-redundant RAID controllers (Section 8.4.6)	Improves performance, increases availability, and prevents I/O bus bottlenecks	Cost of hardware
Install spare disks (Section 8.4.7)	Improves availability	Cost of disks
Replace failed disks promptly (Section 8.4.7)	Improves performance	None

The following sections describe some of these guidelines. See your RAID subsystem documentation for detailed configuration information.

8.4.1 Distributing Storage Set Disks Across Buses

You can improve performance and help to prevent bottlenecks by distributing storage set disks evenly across different buses.

In addition, make sure that the first member of each mirrored set is on a different bus.

8.4.2 Using Disks with the Same Data Capacity

Use disks with the same capacity in the same storage set. This simplifies storage management.

8.4.3 Choosing the Correct Stripe Size

You must understand how your applications perform disk I/O before you can choose the stripe (chunk) size that will provide the best performance benefit. See Section 2.1 for information about determining a resource model for your system.

If the stripe size is large compared to the average I/O size, each disk in a stripe set can respond to a separate data transfer. I/O operations can then be handled in parallel, which increases sequential write performance and throughput. This can improve performance for environments that perform large numbers of I/O operations, including transaction processing, office automation, and file services environments, and for environments that perform multiple random read and write operations.

If the stripe size is smaller than the average I/O operation, multiple disks can simultaneously handle a single I/O operation, which can increase bandwidth and improve sequential file processing. This is beneficial for image processing and data collection environments. However, do not make the stripe size so small that it will degrade performance for large sequential data transfers.

For example, if you use an 8-KB stripe size, small data transfers will be distributed evenly across the member disks, but a 64-KB data transfer will be divided into at least eight data transfers.

If your applications are doing I/O to a raw device and not a file system, use a stripe size that distributes a single data transfer evenly across the member disks. For example, if the typical I/O size is 1 MB and you have a four-disk array, you could use a 256-KB stripe size. This would distribute the data evenly among the four member disks, with each doing a single 256-KB data transfer in parallel.

For small file system I/O operations, use a stripe size that is a multiple of the typical I/O size (for example, four to five times the I/O size). This will help to ensure that the I/O is not split across disks.

You may want to choose a stripe size that will prevent any particular range of blocks from becoming a bottleneck. For example, if an application often uses a particular 8-KB block, you may want to use a stripe size that is slightly larger or smaller than 8 KB or is a multiple of 8 KB to force the data onto a different disk.

8.4.4 Mirroring Striped Sets

Striped disks improve I/O performance by distributing the disk I/O load. Mirroring striped disks provides high availability, but can decrease write performance, because each write operation results in two disk writes.

8.4.5 Using a Write-Back Cache

RAID subsystems support, either as a standard or an optional feature, a nonvolatile (battery-backed) write-back cache that can improve disk I/O performance while maintaining data integrity. A write-back cache improves performance for systems that perform large numbers of writes. Applications that perform few writes will not benefit from a write-back cache.

With write-back caching, data intended to be written to disk is temporarily stored in the cache and then periodically written (flushed) to disk for maximum efficiency. I/O latency is reduced by consolidating contiguous data blocks from multiple host writes into a single unit.

A write-back cache improves performance because writes appear to be executed immediately. If a failure occurs, upon recovery the RAID controller detects any unwritten data that still exists in the write-back cache and writes the data to disk before enabling normal controller operations.

A write-back cache must be battery-backed to protect against data loss and corruption.

If you are using an HSZ40 or HSZ50 RAID controller with a write-back cache, the following guidelines may improve performance:

Set CACHE_POLICY to B.

Set CACHE_FLUSH_TIMER to a minimum of 45 (seconds).

Enable the write-back cache (WRITEBACK_CACHE) for each unit, and set the value of MAXIMUM_CACHED_TRANSFER_SIZE to a minimum of 256.

See the RAID subsystem documentation for more information about using the write-back cache.

8.4.6 Using Dual-Redundant Controllers

If supported by your RAID subsystem, you can use a dual-redundant controller configuration and balance the number of disks across the two controllers. This can improve performance, increase availability, and prevent I/O bus bottlenecks.

8.4.7 Using Spare Disks to Replace Failed Disks

Install predesignated spare disks on separate controller ports and storage shelves. This will help you to maintain data availability and recover quickly if a disk failure occurs.

8.5 Tuning CAM

The Common Access Method (CAM) is the operating system interface to the hardware. CAM maintains pools of buffers that are used to perform I/O. Each buffer takes approximately 1 KB of physical memory. Monitor these pools (see Section 8.2.2) and tune them if necessary.

You can modify the following io subsystem attributes to improve CAM performance:

cam_ccb_pool_size--The initial size of the buffer pool free list at boot time. The default is 200.

cam_ccb_low_water--The number of buffers in the pool free list at which more buffers are allocated from the kernel. CAM reserves this number of buffers to ensure that the kernel always has enough memory to shut down runaway processes. The default is 100.

cam_ccb_increment--The number of buffers either added or removed from the buffer pool free list. Buffers are allocated on an as-needed basis to handle immediate demands, but are released in a more measured manner to guard against spikes. The default is 50.

If the I/O pattern associated with your system tends to have intermittent bursts of I/O operations (I/O spikes), increasing the values of the cam_ccb_pool_size and cam_ccb_increment attributes may improve performance.