8    Managing Disk Storage Performance

There are various ways that you can manage your disk storage. Depending on your performance and availability needs, you can use static disk partitions, the Logical Storage Manager (LSM), hardware RAID, or a combination of these solutions.

The disk storage configuration can have a significant impact on system performance, because disk I/O is used for file system operations and also by the virtual memory subsystem for paging and swapping.

You may be able to improve disk I/O performance by following the configuration and tuning guidelines described in this chapter, which describes how to perform the following tasks:

To configure a disk storage subsystem that will meet your performance and availability needs, you must first understand your workload resource model, as described in Section 2.1.

8.1    Distributing the Disk I/O Load

Distributing the disk I/O load across devices helps to prevent a single disk, controller, or bus from becoming a bottleneck and also allows simultaneous I/O operations.

For example, if you have 16 GB of disk storage, you may get better performance from sixteen 1-GB disks rather than four 4-GB disks. More spindles (disks) may allow more simultaneous operations. For random I/O operations, 16 disks may be simultaneously seeking instead of 4 disks. For large sequential data transfers, 16 data streams can be simultaneously working instead of 4 data streams.

RAID 0 (disk striping) enables you to efficiently distribute disk data across the disks in a stripe set. See Section 8.3.3 and Section 8.4 for more information.

If you are using file systems, place the most frequently used file systems on different disks and optimally on different buses. Directories containing executable files or temporary files are often frequently accessed (for example, /var, /usr, and /tmp). If possible, place /usr and /tmp on different disks.

Guidelines for distributing disk I/O also apply to swap devices. See Section 6.2 for more information about configuring swap devices for high performance.

8.2    Gathering Basic Disk Information

Table 8-1 describes the tools you can use to obtain information about basic disk activity and usage.

Table 8-1:  Basic Disk Monitoring Tools

Name Use Description

sys_check

Analyzes system configuration and displays statistics (Section 4.2)

Creates an HTML file that describes the system configuration, and can be used to diagnose problems. The sys_check utility checks kernel variable settings and memory and CPU resources, and provides performance data and lock statistics for SMP systems and kernel profiles.

The sys_check utility calls other commands and utilities to perform a basic analysis of your configuration and kernel variable settings and provides warnings and tuning recommendations if necessary. See sys_check(8) for more information.

iostat

Displays disk and CPU usage (Section 8.2.1)

Displays transfer statistics for each disk, and the percentage of time the system has spent in user mode, in user mode running low priority (nice) processes, in system mode, and in idle mode.

(dbx) print nchstats

Reports namei cache statistics (Section 9.1.2)

Reports namei cache statistics, including hit rates.

(dbx) print xpt_qhead, ccmn_bp_head, and xpt_cb_queue

Reports Common Access Method (CAM) statistics (Section 8.2.2)

Reports CAM statistics, including information about buffers and completed I/O operations.

diskx

Tests disk driver functionality

Reads and writes data to disk partitions. The diskx exerciser analyzes data transfer performance, verifies the disktab database file entry, and tests reads, writes, and seeks. The diskx exerciser can destroy the contents of a partition. See diskx(8) for more information.

8.2.1    Displaying Disk Usage with the iostat Command

The iostat command reports I/O statistics for terminals, disks, and the CPU that you can use to diagnose disk I/O performance problems. The first line of the output is the average since boot time, and each subsequent report is for the last interval. You can also specify a disk name in the command line to output information only about that disk.

An example of the iostat command is as follows; output is provided in one-second intervals:


# /usr/ucb/iostat 1
      tty     fd0      rz0      rz1      dk3     cpu
 tin tout bps tps  bps tps  bps tps  bps tps  us ni sy id
   1   73   0   0   23   2   37   3    0   0   5  0 17 79
   0   58   0   0   47   5  204  25    0   0   8  0 14 77
   0   58   0   0    8   1   62   1    0   0  27  0 27 46
   0   58   0   0    8   1    0   0    0   0  22  0 31 46

The iostat command output displays the following information:

The iostat command can help you to do the following:

8.2.2    Monitoring CAM by Using the dbx Debugger

The operating system uses the Common Access Method (CAM) as the operating system interface to the hardware. CAM maintains the following dbx data structures:

8.3    Managing Logical Storage Manager Performance

The Logical Storage Manager (LSM) can improve system performance and provide high data availability with little additional overhead. LSM also provides you with online storage management features and enhanced performance information and statistics. Although any type of system can benefit from LSM, it is especially suited for configurations with large numbers of disks or configurations that regularly add storage.

LSM allows you to set up a shared storage pool that consists of multiple disks. You can create virtual disks (LSM volumes) from this pool of storage, according to your performance and capacity needs. LSM volumes are used in the same way as disk partitions. You can create UFS file systems and AdvFS file domains and filesets on an LSM volume, use a volume as a raw device, or create LSM volumes on top of RAID storage sets. You can also use LSM on swap disks.

LSM provides you with flexible and easy management for large storage configurations. Because there is no direct correlation between a virtual disk and a physical disk, file system or raw I/O can span disks, as needed. In addition, you can easily add disks to and remove disks from the pool, balance the load, and perform other storage management tasks.

LSM provides more cost-effective RAID functionality than a hardware RAID subsystem. When LSM is used to stripe or mirror disks, it is sometimes referred to as software RAID. LSM configurations are less complex than hardware RAID.

LSM supports the following basic disk management features:

The following advanced LSM disk management features require a license:

To obtain the best LSM performance, you must follow the configuration and tuning guidelines. The following sections contain information about:

See the Logical Storage Manager manual for detailed information about using LSM.

8.3.1    Basic LSM Configuration Recommendations

There are general recommendations that you can use to configure LSM disks, disk groups, and databases for high performance. Each LSM disk group maintains a configuration database, which includes detailed information about mirrored and striped disks and volume, plex, and subdisk records. How you configure your LSM disks, disk groups, and databases determines the flexibility and performance of your LSM configuration.

Table 8-2 describes the LSM disk, disk group, and database configuration recommendations and lists performance benefits as well as tradeoffs.

Table 8-2:  LSM Disk, Disk Group, and Database Configuration Guidelines

Recommendation Benefit Tradeoff
Initialize your LSM disks as sliced disks (Section 8.3.1.1) Provides greater storage configuration flexibility None
Make the rootdg disk group a sufficient size (Section 8.3.1.2) Ensures sufficient space for disk group information None
Use a sufficient private region size for each disk in a disk group (Section 8.3.1.3) Ensures sufficient space for database copies Large private regions require more disk space
Make the private regions in a disk group the same size (Section 8.3.1.4) Efficiently utilizes the configuration space None
Organize disks into different disk groups according to function (Section 8.3.1.5) Allows you to move disk groups between systems Reduces flexibility when configuring volumes
Use an appropriate size and number of database and log copies (Section 8.3.1.6) Ensures database availability and improves performance None
Place disks containing database and log copies on different buses (Section 8.3.1.7) Improves availability Cost of additional hardware

The following sections describe the previous recommendations in detail.

8.3.1.1    Initializing LSM Disks as Sliced Disks

Initialize your LSM disks as sliced disks, instead of as simple disks. A sliced disk provides greater storage configuration flexibility because the entire disk is under LSM control. The disk label for a sliced disk contains information that identifies the partitions containing the private and the public regions. In contrast, simple disks have both public and private regions in the same partition.

8.3.1.2    Sizing the rootdg Disk Group

You must make sure that the rootdg disk group has an adequate size, because the disk group's configuration database contains records for disks outside of the rootdg disk group, in addition to the ordinary disk-group configuration information. For example, the rootdg configuration database includes disk-access records that define all disks under LSM control.

The rootdg disk group must be large enough to contain records for the disks in all the disk groups. See Table 8-3 for more information.

8.3.1.3    Sizing Private Regions

You must make sure that the private region for each disk has an adequate size. LSM keeps disk media label and configuration database copies in each disk's private region.

A private region must be large enough to accommodate the size of the LSM database copies. In addition, the maximum number of LSM objects (disks, subdisks, volumes, and plexes) in a disk group depends on an adequate private region size. However, a large private region requires more disk space. The default private region size is 1024 blocks, which is usually adequate for configurations using up to 128 disks per disk group.

8.3.1.4    Making Private Regions in a Disk Group the Same Size

The private region of each disk in a disk group should be the same size to efficiently utilize the configuration space. One or two LSM configuration database copies can be stored in a disk's private region.

When you add a new disk to existing an LSM disk group, the size of the private region on the new disk is determined by the private region size of the other disks in the disk group. As you add more disks to a disk group, the voldiskadd utility reduces the number of configuration copies and log copies that are initialized for the new disks. See voldiskadd(8) for more information.

8.3.1.5    Organizing Disks in Disk Groups

You may want to organize disks in disk groups according to their function. This enables disk groups to be moved between systems, and decreases the size of the LSM configuration database for each disk group. However, using multiple disk groups reduces flexibility when configuring volumes.

8.3.1.6    Choosing the Correct Number and Size of the Database and Log Copies

Each disk group maintains a configuration database, which includes detailed information about mirrored and striped disks and volume, plex, and subdisk records. The LSM subsystem's overhead primarily involves managing the kernel change logs and copies of the configuration databases.

LSM performance is affected by the size and the number of copies of the configuration database and the kernel change log. They determine the amount of time it takes for LSM to start up, for changes to the configuration to occur, and for the LSM disks to fail over in a cluster.

Usually, each disk in a disk group contains one or two copies of both the kernel change log and the configuration database. Disk groups consisting of more than eight disks should not have copies on all disks. Always use four to eight copies.

The number of kernel change log copies must be the same as the number of configuration database copies. For the best performance, the number of copies must be the same on each disk that contains copies.

Table 8-3 describes the guidelines for configuration database and kernel change log copies.

Table 8-3:  LSM Database and Kernel Change Log Guidelines

Disks Per Disk Group Size of Private Region (in Blocks) Configuration and Kernel Change Log Copies Per Disk
1 to 3 512 Two copies in each private region
4 to 8 512 One copy in each private region
9 to 32 512 One copy on four to eight disks, zero copies on remaining disks
33 to 128 1024 One copy on four to eight disks, zero copies on remaining disks
129 to 256 1536 One copy on four to eight disks, zero copies on remaining disks
257 or more 2048 One copy on four to eight disks, zero copies on remaining disks

8.3.1.7    Distributing the Database and Logs Across Different Buses

For disk groups with large numbers of disks, place the disks that contain configuration database and kernel change log copies on different buses. This provides you with better performance and higher availability.

8.3.2    LSM Mirrored Volume Configuration Recommendations

Use LSM mirrored volumes (RAID 1) for high data availability. A plex is a set of data. To mirror a volume, you must set up at least two plexes. If a physical disk fails, the plex containing the failed disk becomes temporarily unavailable, but the remaining plexes are still available.

Mirroring can also improve read performance. However, a write to a mirrored volume results in parallel writes to each plex, so mirroring will degrade disk write performance. Environments whose disk I/O operations are predominantly reads obtain the best performance results from mirroring. See Table 8-4 for mirrored volume guidelines.

Use block-change logging (BCL) to improve the mirrored volume recovery rate when a system failure occurs by reducing the synchronization time. If BCL is enabled and a write is made to a mirrored plex, BCL identifies the block numbers that have changed and then stores the numbers on a logging subdisk. BCL is not used for reads.

BCL is enabled if two or more plexes in a mirrored volume have a logging subdisk associated with them. Only one logging subdisk can be associated with a plex.

BCL can add some overhead to your system and degrade the mirrored volume's write performance. However, the impact is less for systems under a heavy I/O load, because multiple writes to the log are batched into a single write. See Table 8-5 for BCL configuration guidelines.

Note

BCL will be replaced by dirty region logging (DRL) in a future release.

You may want to combine mirroring (RAID 1) with striping (RAID 0) to provide high availability and balance the disk I/O load. See Section 8.3.3 for striping guidelines.

Table 8-4 describes LSM mirrored volume configuration recommendations and lists performance benefits as well as tradeoffs.

Table 8-4:  LSM Mirrored Volume Guidelines

Recommendation Benefit Tradeoff
Map mirrored plexes across different buses (Section 8.3.2.1) Improves performance and increases availability None
Use the appropriate read policy (Section 8.3.2.2) Efficiently distributes reads None
Attach up to eight plexes to the same volume (Section 8.3.2.3) Improves performance for read-intensive workloads and increases availability Uses disk space inefficiently
Use a symmetrical configuration (Section 8.3.2.4) Provides more predictable performance None
Use block-change logging (BCL) (Table 8-5) Improves mirrored volume recovery rate May decrease write performance
Stripe the mirrored volumes (Table 8-6) Improves disk I/O performance and balances I/O load Increases management complexity

Table 8-5 describes LSM block-change logging (BCL) configuration recommendations and lists performance benefits as well as tradeoffs.

Table 8-5:  LSM Block-Change Logging Guidelines

Recommendation Benefit Tradeoff
Configure multiple logging subdisks (Section 8.3.2.5) Improves recovery time Requires additional disks
Use a write-back cache for logging subdisks (Section 8.3.2.6) Minimizes BCLs write degradation Cost of hardware RAID subsystem
Use the appropriate BCL subdisk size (Section 8.3.2.7) Enables migration to dirty region logging None
Place logging subdisks on infrequently used disks (Section 8.3.2.8) Helps to prevent disk bottlenecks None
Use solid-state disks for logging subdisks (Section 8.3.2.9) Minimizes BCL's write degradation Cost of solid-state disks

The following sections describe the previous LSM mirrored volume and BCL recommendations in detail.

8.3.2.1    Mirroring Volumes Across Different Buses

Putting each mirrored plex on a different bus improves performance by enabling simultaneous I/O operations. Mirroring across different buses also increases availability by protecting against bus and adapter failure.

8.3.2.2    Choosing a Read Policy for a Mirrored Volume

To provide optimal performance for different types of mirrored volumes, LSM supports the following read policies:

If one plex exhibits superior performance, either because the plex is striped across multiple disks or because it is located on a much faster device, then set the read policy to preferred read for that plex. By default, a mirrored volume with one striped plex should have the striped plex configured as the preferred read. Otherwise, you should use the round-robin read policy.

8.3.2.3    Using Multiple Plexes in a Mirrored Volume

To improve performance for read-intensive workloads, up to eight plexes can be attached to the same mirrored volume. However, this configuration does not use disk space efficiently.

8.3.2.4    Using a Symmetrical Configuration

A symmetrical mirrored disk configuration provides predictable performance and easy management. Use the same number of disks in each mirrored plex. For mirrored striped volumes, you can stripe across half of the available disks to form one plex and across the other half to form the other plex.

8.3.2.5    Using Multiple BCL Subdisks

Using multiple block-change logging (BCL) subdisks will improve recovery time after a failure.

8.3.2.6    Using a Write-Back Cache with LSM

To minimize BCL's impact on write performance, use LSM in conjunction with a RAID subsystem that has a write-back cache. Typically, the BCL performance degradation is more significant on systems with few writes than on systems with heavy write loads.

8.3.2.7    Sizing BCL Subdisks

To support migration from BCL to dirty region logging (DRL), which will be supported in a future release, use the appropriate BCL subdisk size.

If you have less than 64 GB of disk space under LSM control, calculate the subdisk size by multiplying 1 block by each gigabyte of storage. If the result is an odd number, add 1 block; if the result is an even number, add 2 blocks.

For example, if you have 1 GB (or less) of space, use a 2-block subdisk. If you have 2 GB (or 3 GB) of space, use a 4-block subdisk.

If you have more than 64 GB of disk space under LSM control, use a 64-block subdisk.

8.3.2.8    Placing BCL Logging Subdisks on Infrequently Used Disks

Place logging subdisks on infrequently used disks. Because these subdisks are frequently written, do not put them on busy disks. In addition, do not configure BCL subdisks on the same disks as the volume data, because this will cause head seeking or thrashing.

8.3.2.9    Using Solid-State Disks for BCL Subdisks

If persistent (nonvolatile) solid-state disks are available, use them for logging subdisks.

8.3.3    LSM Striped Volume Configuration Recommendations

Striping volumes (RAID 0) with LSM enables parallel I/O streams to operate concurrently on separate devices, which distributes the disk I/O load and improves performance. Striping is especially effective for applications that perform large sequential data transfers or multiple, simultaneous I/O operations.

Striping distributes data in fixed-size portions (stripes) across the disks in a volume. The stripes are interleaved across the striped plex's subdisks, which are located on different disks to evenly distribute the disk I/O.

The performance benefit of striping depends on the stripe width, which is the number of blocks in a stripe, and how your users and applications perform I/O. Bandwidth increases with the number of disks across which a plex is striped.

You can combine mirroring (RAID 1) with striping to obtain high availability. However, mirroring will decrease write performance. See Section 8.3.2 for mirroring guidelines.

Table 8-6 describes the LSM striped volume configuration recommendations and lists performance benefits as well as tradeoffs.

Table 8-6:  LSM Striped Volume Guidelines

Recommendation Benefit Tradeoff
Use multiple disks in a striped volume (Section 8.3.3.1) Improves performance Decreases volume reliability
Distribute subdisks across different disks and buses (Section 8.3.3.2) Improves performance and increases availability None
Use the appropriate stripe width (Section 8.3.3.3) Improves performance None
Avoid splitting small data transfers (Section 8.3.3.3) Improves the performance of volumes that quickly receive multiple data transfers May use disk space inefficiently
Split large individual data transfers (Section 8.3.3.3) Improves the performance of volumes that receive large data transfers Decreases throughput

The following sections describe the previous LSM striped volume configuration recommendations in detail.

8.3.3.1    Increasing the Number of Disks in a Striped Volume

Increasing the number of disks in a striped volume can increase the bandwidth, depending on the applications and file systems you are using and on the number of simultaneous users. However, this reduces the effective mean-time-between-failures (MTBF) of the volume. If this reduction is a problem, use both mirroring and striping. See Section 8.3.2 for mirroring guidelines.

8.3.3.2    Distributing Striped Volume Subdisks Across Different Buses

Distribute the subdisks of a striped volume across different buses. This improves performance and helps to prevent a single bus from becoming a bottleneck.

8.3.3.3    Choosing the Correct Stripe Width

The performance benefit of striping depends on the size of the stripe width and the characteristics of the I/O load. Stripes of data are allocated alternately and evenly to the subdisks of a striped plex. A striped plex consists of a number of equal-sized subdisks located on different disks.

The number of blocks in a stripe determines the stripe width. LSM uses a default stripe width of 64 KB (or 128 sectors), which works well in most environments.

Use the volstat command to determine the number of data transfer splits. For volumes that receive only small I/O transfers, you may not want to use striping because disk access time is important. Striping is most beneficial for large data transfers.

To improve performance of large sequential data transfers, use a stripe width that will divide each individual data transfer and distribute the blocks equally across the disks.

To improve the performance of multiple simultaneous small data transfers, make the stripe width the same size as the data transfer. However, an excessively small stripe width can result in poor system performance.

If you are striping mirrored volumes, ensure that the stripe width is the same for each plex.

8.3.4    Gathering LSM Information

Table 8-7 describes the tools you can use to obtain information about the Logical Storage Manager (LSM).

Table 8-7:  LSM Monitoring Tools

Name Use Description

volprint

Displays LSM configuration information (Section 8.3.4.1)

Displays information about LSM disk groups, disk media, volumes, plexes, and subdisk records. It does not display disk access records. See volprint(8) for more information.

volstat

Monitors LSM performance statistics (Section 8.3.4.2)

Displays performance statistics since boot time for all LSM objects (volumes, plexes, subdisks, and disks). These statistics include information about read and write operations, including the total number of operations, the number of failed operations, the number of blocks read or written, and the average time spent on the operation in a specified interval of time. The volstat utility also can reset the I/O statistics. See volstat(8) for more information.

voltrace

Tracks LSM operations (Section 8.3.4.3)

Sets I/O tracing masks against one or all volumes in the LSM configuration and logs the results to the LSM default event log, /dev/volevent. The utility also formats and displays the tracing mask information and can trace the following ongoing LSM events: requests to logical volumes, requests that LSM passes to the underlying block device drivers, and I/O events, errors, and recoveries. See voltrace(8) for more information.

volwatch

Monitors LSM events (Section 8.3.4.4)

Monitors LSM for failures in disks, volumes, and plexes, and sends mail if a failure occurs. The volwatch script starts automatically when you install LSM. See volwatch(8) for more information.

dxlsm

Monitors LSM objects (Section 8.3.4.5)

Using the Analyze menu, displays information about LSM disks, volumes, and subdisks. See dxlsm(8) for more information.

The following sections describe some of these commands in detail.

8.3.4.1    Displaying LSM Configuration Information by Using the volprint Utility

The volprint utility displays information from records in the LSM configuration database. You can select the records to be displayed by name or by using special search expressions. In addition, you can display record association hierarchies, so that the structure of records is more apparent.

Use the volprint utility to display disk group, disk media, volume, plex, and subdisk records. Invoke the voldisk list command to display disk access records or physical disk information.

The following example uses the volprint utility to show the status of the voldev1 volume:


# /usr/sbin/volprint -ht voldev1
DG NAME        GROUP-ID
DM NAME        DEVICE       TYPE     PRIVLEN  PUBLEN   PUBPATH
V  NAME        USETYPE      KSTATE   STATE    LENGTH   READPOL  PREFPLEX
PL NAME        VOLUME       KSTATE   STATE    LENGTH   LAYOUT   ST-WIDTH MODE
SD NAME        PLEX         PLOFFS   DISKOFFS LENGTH   DISK-NAME    DEVICE
 
v  voldev1     fsgen        ENABLED  ACTIVE   804512   SELECT   -
pl voldev1-01  voldev1      ENABLED  ACTIVE   804512   CONCAT   -        RW
sd rz8-01      voldev1-01   0        0        804512   rz8          rz8
pl voldev1-02  voldev1      ENABLED  ACTIVE   804512   CONCAT   -        RW
sd dev1-01     voldev1-02   0        2295277  402256   dev1         rz9
sd rz15-02     voldev1-02   402256   2295277  402256   rz15         rz15

See volprint(8) for more information.

8.3.4.2    Monitoring LSM Performance Statistics by Using the volstat Utility

The volstat utility provides information about activity on volumes, plexes, subdisks, and disks under LSM control. It reports statistics that reflect the activity levels of LSM objects since boot time.

The amount of information displayed depends on which options you specify to the volstat utility. For example, you can display statistics for a specific LSM object, or you can display statistics for all objects at one time. If you specify a disk group, only statistics for objects in that disk group are displayed. If you do not specify a particular disk group, the volstat utility displays statistics for the default disk group (rootdg).

You can also use the volstat utility to reset the base statistics to zero. This can be done for all objects or for only specified objects. Resetting the statistics to zero before a particular operation makes it possible to measure the subsequent impact of that operation.

The following example uses the volstat utility to display statistics on LSM volumes:

# /usr/sbin/volstat
OPERATIONS       BLOCKS        AVG TIME(ms)
TYP NAME        READ   WRITE    READ    WRITE   READ   WRITE
vol archive      865     807    5722     3809   32.5    24.0
vol home        2980    5287    6504    10550   37.7   221.1
vol local      49477   49230  507892   204975   28.5    33.5
vol src        79174   23603  425472   139302   22.4    30.9
vol swapvol    22751   32364  182001   258905   25.3   323.2

See volstat(8) for more information.

8.3.4.3    Tracking LSM Operations by Using the voltrace Utility

The voltrace utility reads an event log (/dev/volevent) and prints formatted event log records to standard output. Using the voltrace utility, you can set event trace masks to determine the type of events to track. For example, you can trace I/O events, configuration changes, or I/O errors.

The following example uses the voltrace utility to display status on all new events:

# /usr/sbin/voltrace -n -e all
18446744072623507277 IOTRACE 439: req 3987131 v:rootvol p:rootvol-01 \
  d:root_domain s:rz3-02 iot write lb 0 b 63120 len 8192 tm 12
18446744072623507277 IOTRACE 440: req 3987131 \
  v:rootvol iot write lb 0 b 63136 len 8192 tm 12

See voltrace(8) for more information.

8.3.4.4    Monitoring LSM Events by Using the volwatch Script

The volwatch shell script is automatically started when you install LSM. This script sends mail to root if certain LSM configuration events occur, such as a plex detach caused by a disk failure. The script sends mail to root by default. You also can specify another mail recipient.

See volwatch(8) for more information.

8.3.4.5    Monitoring LSM by Using the dxlsm Graphical User Interface

The LSM Visual Administrator, the dxlsm graphical user interface (GUI), allows you to graphically manipulate LSM objects and manage the LSM configuration. The dxlsm GUI also includes an Analyze menu that allows you to display statistics about volumes, LSM disks, and subdisks. The information is graphically displayed, using colors and patterns on the disk icons, and numerically, using the Analysis Statistics form.

You can use the Analysis Parameters form to customize the displayed information.

See the Logical Storage Manager manual and dxlsm(8X) for more information.

8.3.5    Improving LSM Performance

You may be able to improve LSM performance by modifying an LSM subsystem attribute or by performing some administrative tasks. Be sure you have followed the configuration guidelines that are described in Section 8.3.1, Section 8.3.2, and Section 8.3.3.

You can improve LSM performance as follows:

8.4    Managing Hardware RAID Subsystem Performance

Hardware RAID subsystems provide RAID functionality for high performance and high availability, relieve the CPU of disk I/O overhead, and enable you to connect many disks to a single I/O bus. There are various types of hardware RAID subsystems with different performance and availability features, but they all include a RAID controller, disks in enclosures, cabling, and disk management software.

RAID storage solutions range from low-cost backplane RAID array controllers to cluster-capable RAID array controllers that provide extensive performance and availability features, such as write-back caches and complete component redundancy.

Hardware RAID subsystems use disk management software, such as the RAID Configuration Utility (RCU) and the StorageWorks Command Console (SWCC) utility, to manage the RAID devices. Menu-driven interfaces allow you to select RAID levels.

Use hardware RAID to combine multiple disks into a single storage set that the system sees as a single unit. A storage set can consist of a simple set of disks, a striped set, a mirrored set, or a RAID set. You can create LSM volumes, AdvFS file domains, or UFS file systems on a storage set, or you can use the storage set as a raw device.

All hardware RAID subsystems provide you with the following features:

There are various types of hardware RAID subsystems, which provide different degrees of performance and availability at various costs:

See the Compaq Systems & Options Catalog for detailed information about hardware RAID subsystem features.

Table 8-8 describes the hardware RAID subsystem configuration recommendations and lists performance benefits as well as tradeoffs.

Table 8-8:  Hardware RAID Subsystem Configuration Guidelines

Recommendation Performance Benefit Tradeoff
Evenly distribute disks in a storage set across different buses (Section 8.4.1) Improves performance and helps to prevent bottlenecks None
Configure devices for multipath concurrent access Improves throughput and increases availability Cost of devices that support this feature
Use disks with the same data capacity in each storage set (Section 8.4.2) Simplifies storage management None
Use an appropriate stripe size (Section 8.4.3) Improves performance None
Mirror striped sets (Section 8.4.4) Provides availability and distributes disk I/O performance Increases configuration complexity and may decrease write performance
Use a write-back cache (Section 8.4.5) Improves write performance Cost of hardware
Use dual-redundant RAID controllers (Section 8.4.6) Improves performance, increases availability, and prevents I/O bus bottlenecks Cost of hardware
Install spare disks (Section 8.4.7) Improves availability Cost of disks
Replace failed disks promptly (Section 8.4.7) Improves performance None

The following sections describe some of these guidelines. See your RAID subsystem documentation for detailed configuration information.

8.4.1    Distributing Storage Set Disks Across Buses

You can improve performance and help to prevent bottlenecks by distributing storage set disks evenly across different buses.

In addition, make sure that the first member of each mirrored set is on a different bus.

8.4.2    Using Disks with the Same Data Capacity

Use disks with the same capacity in the same storage set. This simplifies storage management.

8.4.3    Choosing the Correct Stripe Size

You must understand how your applications perform disk I/O before you can choose the stripe (chunk) size that will provide the best performance benefit. See Section 2.1 for information about determining a resource model for your system.

If the stripe size is large compared to the average I/O size, each disk in a stripe set can respond to a separate data transfer. I/O operations can then be handled in parallel, which increases sequential write performance and throughput. This can improve performance for environments that perform large numbers of I/O operations, including transaction processing, office automation, and file services environments, and for environments that perform multiple random read and write operations.

If the stripe size is smaller than the average I/O operation, multiple disks can simultaneously handle a single I/O operation, which can increase bandwidth and improve sequential file processing. This is beneficial for image processing and data collection environments. However, do not make the stripe size so small that it will degrade performance for large sequential data transfers.

For example, if you use an 8-KB stripe size, small data transfers will be distributed evenly across the member disks, but a 64-KB data transfer will be divided into at least eight data transfers.

If your applications are doing I/O to a raw device and not a file system, use a stripe size that distributes a single data transfer evenly across the member disks. For example, if the typical I/O size is 1 MB and you have a four-disk array, you could use a 256-KB stripe size. This would distribute the data evenly among the four member disks, with each doing a single 256-KB data transfer in parallel.

For small file system I/O operations, use a stripe size that is a multiple of the typical I/O size (for example, four to five times the I/O size). This will help to ensure that the I/O is not split across disks.

You may want to choose a stripe size that will prevent any particular range of blocks from becoming a bottleneck. For example, if an application often uses a particular 8-KB block, you may want to use a stripe size that is slightly larger or smaller than 8 KB or is a multiple of 8 KB to force the data onto a different disk.

8.4.4    Mirroring Striped Sets

Striped disks improve I/O performance by distributing the disk I/O load. Mirroring striped disks provides high availability, but can decrease write performance, because each write operation results in two disk writes.

8.4.5    Using a Write-Back Cache

RAID subsystems support, either as a standard or an optional feature, a nonvolatile (battery-backed) write-back cache that can improve disk I/O performance while maintaining data integrity. A write-back cache improves performance for systems that perform large numbers of writes. Applications that perform few writes will not benefit from a write-back cache.

With write-back caching, data intended to be written to disk is temporarily stored in the cache and then periodically written (flushed) to disk for maximum efficiency. I/O latency is reduced by consolidating contiguous data blocks from multiple host writes into a single unit.

A write-back cache improves performance because writes appear to be executed immediately. If a failure occurs, upon recovery the RAID controller detects any unwritten data that still exists in the write-back cache and writes the data to disk before enabling normal controller operations.

A write-back cache must be battery-backed to protect against data loss and corruption.

If you are using an HSZ40 or HSZ50 RAID controller with a write-back cache, the following guidelines may improve performance:

See the RAID subsystem documentation for more information about using the write-back cache.

8.4.6    Using Dual-Redundant Controllers

If supported by your RAID subsystem, you can use a dual-redundant controller configuration and balance the number of disks across the two controllers. This can improve performance, increase availability, and prevent I/O bus bottlenecks.

8.4.7    Using Spare Disks to Replace Failed Disks

Install predesignated spare disks on separate controller ports and storage shelves. This will help you to maintain data availability and recover quickly if a disk failure occurs.

8.5    Tuning CAM

The Common Access Method (CAM) is the operating system interface to the hardware. CAM maintains pools of buffers that are used to perform I/O. Each buffer takes approximately 1 KB of physical memory. Monitor these pools (see Section 8.2.2) and tune them if necessary.

You can modify the following io subsystem attributes to improve CAM performance:

If the I/O pattern associated with your system tends to have intermittent bursts of I/O operations (I/O spikes), increasing the values of the cam_ccb_pool_size and cam_ccb_increment attributes may improve performance.