There are various ways that you can manage your disk storage. Depending on your performance and availability needs, you can use static disk partitions, the Logical Storage Manager (LSM), hardware RAID, or a combination of these solutions.
The disk storage configuration can have a significant impact on system performance, because disk I/O is used for file system operations and also by the virtual memory subsystem for paging and swapping.
You may be able to improve disk I/O performance by following the configuration and tuning guidelines described in this chapter, which describes how to perform the following tasks:
Improve performance by efficiently distributing the disk I/O load (Section 8.1)
Obtain information about the disk storage configuration and performance (Section 8.2)
Manage LSM performance (Section 8.3)
Improve hardware RAID subsystem performance (Section 8.4)
Tune the Common Access Method (CAM) (Section 8.5)
To configure a disk storage subsystem that will meet your performance and availability needs, you must first understand your workload resource model, as described in Section 2.1.
Distributing the disk I/O load across devices helps to prevent a single disk, controller, or bus from becoming a bottleneck and also allows simultaneous I/O operations.
For example, if you have 16 GB of disk storage, you may get better performance from sixteen 1-GB disks rather than four 4-GB disks. More spindles (disks) may allow more simultaneous operations. For random I/O operations, 16 disks may be simultaneously seeking instead of 4 disks. For large sequential data transfers, 16 data streams can be simultaneously working instead of 4 data streams.
RAID 0 (disk striping) enables you to efficiently distribute disk data across the disks in a stripe set. See Section 8.3.3 and Section 8.4 for more information.
If you are using
file systems, place the most frequently used file systems on different disks
and optimally on different buses.
Directories containing executable files
or temporary files are often frequently accessed (for example,
/var,
/usr, and
/tmp).
If possible,
place
/usr
and
/tmp
on different disks.
Guidelines for distributing disk I/O also apply to swap devices. See Section 6.2 for more information about configuring swap devices for high performance.
Table 8-1 describes the tools you can use to obtain information about basic disk activity and usage.
| Name | Use | Description |
|
Analyzes system configuration and displays statistics (Section 4.2) |
Creates an HTML file that describes
the system configuration, and can be used to diagnose problems.
The
The
|
Displays disk and CPU usage (Section 8.2.1) |
Displays transfer statistics for each
disk, and the percentage of time the system has spent in user mode, in user
mode running low priority ( |
|
|
Reports namei cache statistics (Section 9.1.2) |
Reports namei cache statistics, including hit rates. |
|
Reports Common Access Method (CAM) statistics (Section 8.2.2) |
Reports CAM statistics, including information about buffers and completed I/O operations. |
Tests disk driver functionality |
Reads and writes data to disk partitions.
The
|
The
iostat
command reports I/O statistics for terminals,
disks, and the CPU that you can use to diagnose disk I/O performance problems.
The first line of the output is the average since boot time, and
each subsequent report is for the last interval.
You can also specify a disk
name in the command line to output information only about that disk.
An example of the
iostat
command is as follows; output
is provided in one-second intervals:
#/usr/ucb/iostat 1tty fd0 rz0 rz1 dk3 cpu tin tout bps tps bps tps bps tps bps tps us ni sy id 1 73 0 0 23 2 37 3 0 0 5 0 17 79 0 58 0 0 47 5 204 25 0 0 8 0 14 77 0 58 0 0 8 1 62 1 0 0 27 0 27 46 0 58 0 0 8 1 0 0 0 0 22 0 31 46
The
iostat
command output
displays the following information:
For each disk, (rzn), the number
of kilobytes transferred per second (bps) and the number
of transfers per second (tps).
For the system (cpu), the percentage of
time the CPU has spent in user state running processes either at their default
priority or preferred priority (us), in user mode running
processes at a less favored priority (ni), in system mode
(sy), and in idle mode (id).
This information
enables you to determine how disk I/O is affecting the CPU.
User mode includes
the time the CPU spent executing library routines.
System mode includes the
time the CPU spent executing system calls.
The
iostat
command can help you to do the following:
Determine which disk is being used the most and which
is being used the least.
This information will help you determine how to distribute
your file systems and swap space.
Use the
swapon -s
command
to determine which disks are used for swap space.
If the
iostat
command output shows a lot of disk
activity and a high system idle time, the system may be disk bound.
You
may need to balance the disk I/O load, defragment disks, or upgrade your
hardware.
If a disk is doing a large number of transfers (the
tps
field) but reading and writing only small amounts of data (the
bps
field), examine how your applications are doing disk I/O.
The
application may be performing a large number of I/O operations to handle only
a small amount of data.
You may want to rewrite the application if this
behavior is not necessary.
The operating system uses the Common Access Method (CAM) as the
operating system interface to the hardware.
CAM maintains the following
dbx
data structures:
xpt_qhead--Contains information about
the current size of the buffer pool free list (xpt_nfree),
the current number of processes waiting for buffers (xpt_wait_cnt), and the total number of times that processes had to wait for
free buffers (xpt_times_wait).
For example:
#/usr/ucb/dbx -k /vmunix /dev/mem(dbx)print xpt_qheadstruct { xws = struct { x_flink = 0xffffffff81f07400 x_blink = 0xffffffff81f03000 xpt_flags = 2147483656 xpt_ccb = (nil) xpt_nfree = 300 xpt_nbusy = 0 } xpt_wait_cnt = 0 xpt_times_wait = 2 xpt_ccb_limit = 1048576 xpt_ccbs_total = 300 x_lk_qhead = struct { sl_data = 0 sl_info = 0 sl_cpuid = 0 sl_lifms = 0 } } (dbx)
If the value for the
xpt_wait_cnt
field is not zero,
CAM has run out of buffer pool space.
If this situation persists, you may
be able to eliminate the problem by changing one or more of CAM's I/O attributes
(see
Section 8.5).
ccmn_bp_head--Provides statistics
on the buffer structure pool.
This pool is used for raw I/O to disk.
The
information provided is the current size of the buffer structure
pool (num_bp) and the wait count for buffers (bp_wait_cnt).
For example:
#/usr/ucb/dbx -k /vmunix /dev/mem(dbx)print ccmn_bp_headstruct { num_bp = 50 bp_list = 0xffffffff81f1be00 bp_wait_cnt = 0 } (dbx)
If the value for the
bp_wait_cnt
field is not zero,
CAM has run out of buffer pool space.
If this situation persists, you may
be able to eliminate the problem by changing one or more of the CAM subsystem
attributes (see
Section 8.5).
xpt_cb_queue--Contains the actual
link list of the I/O operations that have been completed and are waiting to
be passed back to the peripheral drivers (cam_disk
or
cam_tape).
For example:
#/usr/ucb/dbx -k /vmunix /dev/mem(dbx)print xpt_cb_queuestruct { flink = 0xfffffc00004d6828 blink = 0xfffffc00004d6828 flags = 0 initialized = 1 count = 0 cplt_lock = struct { sl_data = 0 sl_info = 0 sl_cpuid = 0 sl_lifms = 0 } } (dbx)
The
count
field specifies the number of I/O operations
that have been completed and are ready to be passed back to a peripheral device
driver.
Normally, this value is 0 or 1.
If the value of
count
is temporarily greater than 1, it may indicate that a large number of I/O
operations are completing simultaneously.
If the value is consistently greater
than 1, it may indicate a problem.
The Logical Storage Manager (LSM) can improve system performance and provide high data availability with little additional overhead. LSM also provides you with online storage management features and enhanced performance information and statistics. Although any type of system can benefit from LSM, it is especially suited for configurations with large numbers of disks or configurations that regularly add storage.
LSM allows you to set up a shared storage pool that consists of multiple disks. You can create virtual disks (LSM volumes) from this pool of storage, according to your performance and capacity needs. LSM volumes are used in the same way as disk partitions. You can create UFS file systems and AdvFS file domains and filesets on an LSM volume, use a volume as a raw device, or create LSM volumes on top of RAID storage sets. You can also use LSM on swap disks.
LSM provides you with flexible and easy management for large storage configurations. Because there is no direct correlation between a virtual disk and a physical disk, file system or raw I/O can span disks, as needed. In addition, you can easily add disks to and remove disks from the pool, balance the load, and perform other storage management tasks.
LSM provides more cost-effective RAID functionality than a hardware RAID subsystem. When LSM is used to stripe or mirror disks, it is sometimes referred to as software RAID. LSM configurations are less complex than hardware RAID.
LSM supports the following basic disk management features:
Pool of storage
Load balancing by transparently moving data across disks
Disk concatenation (creating a large volume from multiple disks)
Detailed LSM performance information from the
volstat
command
The following advanced LSM disk management features require a license:
RAID 1 (disk mirroring)
Block-change logging (BCL), which improves the mirrored volume recovery rate
Striped swap disks
Graphical user interface (GUI) for easy disk management and detailed performance information
To obtain the best LSM performance, you must follow the configuration and tuning guidelines. The following sections contain information about:
Guidelines for disks, disk groups, and databases (Section 8.3.1)
Guidelines for mirrored disks (Section 8.3.2)
Guidelines for striped disks (Section 8.3.3)
Monitoring the LSM configuration and performance (Section 8.3.4)
Improving LSM performance (Section 8.3.5)
See the Logical Storage Manager manual for detailed information about using LSM.
There are general recommendations that you can use to configure LSM disks, disk groups, and databases for high performance. Each LSM disk group maintains a configuration database, which includes detailed information about mirrored and striped disks and volume, plex, and subdisk records. How you configure your LSM disks, disk groups, and databases determines the flexibility and performance of your LSM configuration.
Table 8-2 describes the LSM disk, disk group, and database configuration recommendations and lists performance benefits as well as tradeoffs.
| Recommendation | Benefit | Tradeoff |
| Initialize your LSM disks as sliced disks (Section 8.3.1.1) | Provides greater storage configuration flexibility | None |
Make the
rootdg
disk group
a sufficient size (Section 8.3.1.2) |
Ensures sufficient space for disk group information | None |
| Use a sufficient private region size for each disk in a disk group (Section 8.3.1.3) | Ensures sufficient space for database copies | Large private regions require more disk space |
| Make the private regions in a disk group the same size (Section 8.3.1.4) | Efficiently utilizes the configuration space | None |
| Organize disks into different disk groups according to function (Section 8.3.1.5) | Allows you to move disk groups between systems | Reduces flexibility when configuring volumes |
| Use an appropriate size and number of database and log copies (Section 8.3.1.6) | Ensures database availability and improves performance | None |
| Place disks containing database and log copies on different buses (Section 8.3.1.7) | Improves availability | Cost of additional hardware |
The following sections describe the previous recommendations in detail.
Initialize your LSM disks as sliced disks, instead of as simple disks. A sliced disk provides greater storage configuration flexibility because the entire disk is under LSM control. The disk label for a sliced disk contains information that identifies the partitions containing the private and the public regions. In contrast, simple disks have both public and private regions in the same partition.
You
must make sure that the
rootdg
disk group has an
adequate size, because the disk group's configuration database contains
records for disks outside of the
rootdg
disk group, in addition to the ordinary disk-group configuration
information.
For example, the
rootdg
configuration database includes disk-access records that define
all disks under LSM control.
The
rootdg
disk group must be large enough to contain
records for the disks in all the disk groups.
See
Table 8-3
for more information.
You must make sure that the private region for each disk has an adequate size. LSM keeps disk media label and configuration database copies in each disk's private region.
A private region must be large enough to accommodate the size of the LSM database copies. In addition, the maximum number of LSM objects (disks, subdisks, volumes, and plexes) in a disk group depends on an adequate private region size. However, a large private region requires more disk space. The default private region size is 1024 blocks, which is usually adequate for configurations using up to 128 disks per disk group.
The private region of each disk in a disk group should be the same size to efficiently utilize the configuration space. One or two LSM configuration database copies can be stored in a disk's private region.
When you add a new disk to existing an LSM disk group, the size of the
private region on the new disk is determined by the private region size
of the other disks in the disk group.
As you add more disks to a disk group,
the
voldiskadd
utility reduces the number of configuration
copies and log copies that are initialized for the new disks.
See
voldiskadd(8)
for more information.
You may want to organize disks in disk groups according to their function. This enables disk groups to be moved between systems, and decreases the size of the LSM configuration database for each disk group. However, using multiple disk groups reduces flexibility when configuring volumes.
Each disk group maintains a configuration database, which includes detailed information about mirrored and striped disks and volume, plex, and subdisk records. The LSM subsystem's overhead primarily involves managing the kernel change logs and copies of the configuration databases.
LSM performance is affected by the size and the number of copies of the configuration database and the kernel change log. They determine the amount of time it takes for LSM to start up, for changes to the configuration to occur, and for the LSM disks to fail over in a cluster.
Usually, each disk in a disk group contains one or two copies of both the kernel change log and the configuration database. Disk groups consisting of more than eight disks should not have copies on all disks. Always use four to eight copies.
The number of kernel change log copies must be the same as the number of configuration database copies. For the best performance, the number of copies must be the same on each disk that contains copies.
Table 8-3 describes the guidelines for configuration database and kernel change log copies.
| Disks Per Disk Group | Size of Private Region (in Blocks) | Configuration and Kernel Change Log Copies Per Disk |
| 1 to 3 | 512 | Two copies in each private region |
| 4 to 8 | 512 | One copy in each private region |
| 9 to 32 | 512 | One copy on four to eight disks, zero copies on remaining disks |
| 33 to 128 | 1024 | One copy on four to eight disks, zero copies on remaining disks |
| 129 to 256 | 1536 | One copy on four to eight disks, zero copies on remaining disks |
| 257 or more | 2048 | One copy on four to eight disks, zero copies on remaining disks |
For disk groups with large numbers of disks, place the disks that contain configuration database and kernel change log copies on different buses. This provides you with better performance and higher availability.
Use LSM mirrored volumes (RAID 1) for high data availability. A plex is a set of data. To mirror a volume, you must set up at least two plexes. If a physical disk fails, the plex containing the failed disk becomes temporarily unavailable, but the remaining plexes are still available.
Mirroring can also improve read performance. However, a write to a mirrored volume results in parallel writes to each plex, so mirroring will degrade disk write performance. Environments whose disk I/O operations are predominantly reads obtain the best performance results from mirroring. See Table 8-4 for mirrored volume guidelines.
Use block-change logging (BCL) to improve the mirrored volume recovery rate when a system failure occurs by reducing the synchronization time. If BCL is enabled and a write is made to a mirrored plex, BCL identifies the block numbers that have changed and then stores the numbers on a logging subdisk. BCL is not used for reads.
BCL is enabled if two or more plexes in a mirrored volume have a logging subdisk associated with them. Only one logging subdisk can be associated with a plex.
BCL can add some overhead to your system and degrade the mirrored volume's write performance. However, the impact is less for systems under a heavy I/O load, because multiple writes to the log are batched into a single write. See Table 8-5 for BCL configuration guidelines.
Note
BCL will be replaced by dirty region logging (DRL) in a future release.
You may want to combine mirroring (RAID 1) with striping (RAID 0) to provide high availability and balance the disk I/O load. See Section 8.3.3 for striping guidelines.
Table 8-4 describes LSM mirrored volume configuration recommendations and lists performance benefits as well as tradeoffs.
| Recommendation | Benefit | Tradeoff |
| Map mirrored plexes across different buses (Section 8.3.2.1) | Improves performance and increases availability | None |
| Use the appropriate read policy (Section 8.3.2.2) | Efficiently distributes reads | None |
| Attach up to eight plexes to the same volume (Section 8.3.2.3) | Improves performance for read-intensive workloads and increases availability | Uses disk space inefficiently |
| Use a symmetrical configuration (Section 8.3.2.4) | Provides more predictable performance | None |
| Use block-change logging (BCL) (Table 8-5) | Improves mirrored volume recovery rate | May decrease write performance |
| Stripe the mirrored volumes (Table 8-6) | Improves disk I/O performance and balances I/O load | Increases management complexity |
Table 8-5 describes LSM block-change logging (BCL) configuration recommendations and lists performance benefits as well as tradeoffs.
| Recommendation | Benefit | Tradeoff |
| Configure multiple logging subdisks (Section 8.3.2.5) | Improves recovery time | Requires additional disks |
| Use a write-back cache for logging subdisks (Section 8.3.2.6) | Minimizes BCLs write degradation | Cost of hardware RAID subsystem |
| Use the appropriate BCL subdisk size (Section 8.3.2.7) | Enables migration to dirty region logging | None |
| Place logging subdisks on infrequently used disks (Section 8.3.2.8) | Helps to prevent disk bottlenecks | None |
| Use solid-state disks for logging subdisks (Section 8.3.2.9) | Minimizes BCL's write degradation | Cost of solid-state disks |
The following sections describe the previous LSM mirrored volume and BCL recommendations in detail.
Putting each mirrored plex on a different bus improves performance by enabling simultaneous I/O operations. Mirroring across different buses also increases availability by protecting against bus and adapter failure.
To provide optimal performance for different types of mirrored volumes, LSM supports the following read policies:
Round-robin read
Satisfies read operations to the volume in a round-robin manner from all plexes in the volume.
Preferred read
Satisfies read operations from one specific plex (usually the plex with the highest performance).
Select
Selects a default read policy based on the plex associations to the volume. If the mirrored volume contains a single, enabled, striped plex, the default is to prefer that plex. For any other set of plex associations, the default is to use a round-robin policy.
If one plex exhibits superior performance, either because the plex is striped across multiple disks or because it is located on a much faster device, then set the read policy to preferred read for that plex. By default, a mirrored volume with one striped plex should have the striped plex configured as the preferred read. Otherwise, you should use the round-robin read policy.
To improve performance for read-intensive workloads, up to eight plexes can be attached to the same mirrored volume. However, this configuration does not use disk space efficiently.
A symmetrical mirrored disk configuration provides predictable performance and easy management. Use the same number of disks in each mirrored plex. For mirrored striped volumes, you can stripe across half of the available disks to form one plex and across the other half to form the other plex.
Using multiple block-change logging (BCL) subdisks will improve recovery time after a failure.
To minimize BCL's impact on write performance, use LSM in conjunction with a RAID subsystem that has a write-back cache. Typically, the BCL performance degradation is more significant on systems with few writes than on systems with heavy write loads.
To support migration from BCL to dirty region logging (DRL), which will be supported in a future release, use the appropriate BCL subdisk size.
If you have less than 64 GB of disk space under LSM control, calculate the subdisk size by multiplying 1 block by each gigabyte of storage. If the result is an odd number, add 1 block; if the result is an even number, add 2 blocks.
For example, if you have 1 GB (or less) of space, use a 2-block subdisk. If you have 2 GB (or 3 GB) of space, use a 4-block subdisk.
If you have more than 64 GB of disk space under LSM control, use a 64-block subdisk.
Place logging subdisks on infrequently used disks. Because these subdisks are frequently written, do not put them on busy disks. In addition, do not configure BCL subdisks on the same disks as the volume data, because this will cause head seeking or thrashing.
If persistent (nonvolatile) solid-state disks are available, use them for logging subdisks.
Striping volumes (RAID 0) with LSM enables parallel I/O streams to operate concurrently on separate devices, which distributes the disk I/O load and improves performance. Striping is especially effective for applications that perform large sequential data transfers or multiple, simultaneous I/O operations.
Striping distributes data in fixed-size portions (stripes) across the disks in a volume. The stripes are interleaved across the striped plex's subdisks, which are located on different disks to evenly distribute the disk I/O.
The performance benefit of striping depends on the stripe width, which is the number of blocks in a stripe, and how your users and applications perform I/O. Bandwidth increases with the number of disks across which a plex is striped.
You can combine mirroring (RAID 1) with striping to obtain high availability. However, mirroring will decrease write performance. See Section 8.3.2 for mirroring guidelines.
Table 8-6 describes the LSM striped volume configuration recommendations and lists performance benefits as well as tradeoffs.
| Recommendation | Benefit | Tradeoff |
| Use multiple disks in a striped volume (Section 8.3.3.1) | Improves performance | Decreases volume reliability |
| Distribute subdisks across different disks and buses (Section 8.3.3.2) | Improves performance and increases availability | None |
| Use the appropriate stripe width (Section 8.3.3.3) | Improves performance | None |
| Avoid splitting small data transfers (Section 8.3.3.3) | Improves the performance of volumes that quickly receive multiple data transfers | May use disk space inefficiently |
| Split large individual data transfers (Section 8.3.3.3) | Improves the performance of volumes that receive large data transfers | Decreases throughput |
The following sections describe the previous LSM striped volume configuration recommendations in detail.
Increasing the number of disks in a striped volume can increase the bandwidth, depending on the applications and file systems you are using and on the number of simultaneous users. However, this reduces the effective mean-time-between-failures (MTBF) of the volume. If this reduction is a problem, use both mirroring and striping. See Section 8.3.2 for mirroring guidelines.
Distribute the subdisks of a striped volume across different buses. This improves performance and helps to prevent a single bus from becoming a bottleneck.
The performance benefit of striping depends on the size of the stripe width and the characteristics of the I/O load. Stripes of data are allocated alternately and evenly to the subdisks of a striped plex. A striped plex consists of a number of equal-sized subdisks located on different disks.
The number of blocks in a stripe determines the stripe width. LSM uses a default stripe width of 64 KB (or 128 sectors), which works well in most environments.
Use the
volstat
command to determine the number of
data transfer splits.
For volumes that receive only small I/O transfers, you
may not want to use striping because disk access time is important.
Striping
is most beneficial for large data transfers.
To improve performance of large sequential data transfers, use a stripe width that will divide each individual data transfer and distribute the blocks equally across the disks.
To improve the performance of multiple simultaneous small data transfers, make the stripe width the same size as the data transfer. However, an excessively small stripe width can result in poor system performance.
If you are striping mirrored volumes, ensure that the stripe width is the same for each plex.
Table 8-7 describes the tools you can use to obtain information about the Logical Storage Manager (LSM).
| Name | Use | Description |
|
Displays LSM configuration information (Section 8.3.4.1) |
Displays information about LSM disk
groups, disk media, volumes, plexes, and subdisk records.
It does not display
disk access records.
See
|
|
Monitors LSM performance statistics (Section 8.3.4.2) |
Displays performance statistics since
boot time for all LSM objects (volumes, plexes, subdisks, and disks).
These
statistics include information about read and write operations, including
the total number of operations, the number of failed operations, the number
of blocks read or written, and the average time spent on the operation in
a specified interval of time.
The
|
|
Tracks LSM operations (Section 8.3.4.3) |
Sets I/O tracing masks against one
or all volumes in the LSM configuration and logs the results to the LSM default
event log,
|
|
Monitors LSM events (Section 8.3.4.4) |
Monitors LSM for failures in disks,
volumes, and plexes, and sends mail if a failure occurs.
The
|
|
Monitors LSM objects (Section 8.3.4.5) |
Using the Analyze menu, displays information
about LSM disks, volumes, and subdisks.
See
|
The following sections describe some of these commands in detail.
The
volprint
utility displays information from records in the LSM configuration
database.
You can select the records to be displayed by name or by using
special search expressions.
In addition, you can display record association
hierarchies, so that the structure of records is more apparent.
Use the
volprint
utility to display disk group, disk
media, volume, plex, and subdisk records.
Invoke the
voldisk list
command to display disk access records or physical disk information.
The following example uses the
volprint
utility to
show the status of the
voldev1
volume:
#/usr/sbin/volprint -ht voldev1DG NAME GROUP-ID DM NAME DEVICE TYPE PRIVLEN PUBLEN PUBPATH V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-NAME DEVICE v voldev1 fsgen ENABLED ACTIVE 804512 SELECT - pl voldev1-01 voldev1 ENABLED ACTIVE 804512 CONCAT - RW sd rz8-01 voldev1-01 0 0 804512 rz8 rz8 pl voldev1-02 voldev1 ENABLED ACTIVE 804512 CONCAT - RW sd dev1-01 voldev1-02 0 2295277 402256 dev1 rz9 sd rz15-02 voldev1-02 402256 2295277 402256 rz15 rz15
See
volprint(8)
for more information.
The
volstat
utility provides information about activity on volumes,
plexes, subdisks, and disks under LSM control.
It reports statistics that
reflect the activity levels of LSM objects since boot time.
The amount of information displayed depends on which options you specify
to the
volstat
utility.
For example, you can display statistics
for a specific LSM object, or you can display statistics for all objects at
one time.
If you specify a disk group, only statistics for objects in that
disk group are displayed.
If you do not specify a particular disk group,
the
volstat
utility displays statistics for the default
disk group (rootdg).
You can also use the
volstat
utility to reset the
base statistics to zero.
This can be done for all objects or for only
specified objects.
Resetting the statistics to zero before a particular
operation makes it possible to measure the subsequent impact of that
operation.
The following example uses the
volstat
utility to
display statistics on LSM volumes:
#/usr/sbin/volstatOPERATIONS BLOCKS AVG TIME(ms) TYP NAME READ WRITE READ WRITE READ WRITE vol archive 865 807 5722 3809 32.5 24.0 vol home 2980 5287 6504 10550 37.7 221.1 vol local 49477 49230 507892 204975 28.5 33.5 vol src 79174 23603 425472 139302 22.4 30.9 vol swapvol 22751 32364 182001 258905 25.3 323.2
See
volstat(8)
for more information.
The
voltrace
utility reads an event log (/dev/volevent) and prints
formatted event log records to standard output.
Using the
voltrace
utility, you can set event trace masks to determine the type
of events to track.
For example, you can trace I/O events, configuration
changes, or I/O errors.
The following example uses the
voltrace
utility
to display status on all new events:
#/usr/sbin/voltrace -n -e all18446744072623507277 IOTRACE 439: req 3987131 v:rootvol p:rootvol-01 \ d:root_domain s:rz3-02 iot write lb 0 b 63120 len 8192 tm 12 18446744072623507277 IOTRACE 440: req 3987131 \ v:rootvol iot write lb 0 b 63136 len 8192 tm 12
See
voltrace(8)
for more information.
The
volwatch
shell script is automatically started when you install LSM.
This
script sends mail to root if certain LSM configuration events occur, such
as a plex detach caused by a disk failure.
The script sends mail to root
by default.
You also can specify another mail recipient.
See
volwatch(8)
for more information.
The LSM Visual Administrator, the
dxlsm
graphical user interface (GUI), allows you to graphically
manipulate LSM objects and manage the LSM configuration.
The
dxlsm
GUI also includes an Analyze menu that allows you to display statistics
about volumes, LSM disks, and subdisks.
The information is graphically displayed,
using colors and patterns on the disk icons, and numerically, using the
Analysis
Statistics
form.
You can use the
Analysis
Parameters
form to customize the displayed
information.
See the
Logical Storage Manager
manual and
dxlsm(8X)
for more information.
You may be able to improve LSM performance by modifying an LSM subsystem attribute or by performing some administrative tasks. Be sure you have followed the configuration guidelines that are described in Section 8.3.1, Section 8.3.2, and Section 8.3.3.
You can improve LSM performance as follows:
Increase the maximum number of LSM volumes
For
large systems, increase the value of the
lsm
subsystem
attribute
max-vol, which specifies the maximum number of volumes per system.
The default is 1024; you can increase it to 4096.
See
Section 4.4
for information about modifying kernel
subsystem attributes.
Balance the I/O load
LSM allows you to achieve a fine level of granularity in data placement, because LSM provides a way for volumes to be distributed across multiple disks. After measuring actual data-access patterns, you can adjust the placement of file systems.
You can reassign data to specific disks to balance the I/O load among the available storage devices. You can reconfigure volumes on line after performance patterns have been established without adversely impacting volume availability.
Stripe frequently accessed data
If you have frequently accessed file systems or databases, you can realize significant performance benefits by striping the data across multiple disks, which increases bandwidth to this data. See Section 8.3.3 for information.
Set the preferred read policy to the fastest mirrored plex
If one plex of a mirrored volume exhibits superior performance, either because the disk is being striped or concatenated across multiple disks, or because it is located on a much faster device, then set the read policy to the preferred read policy for that plex. By default, a mirrored volume with one striped plex should be configured with the striped plex as the preferred read. See Section 8.3.2.2 for more information.
You may be able to improve the performance of LSM systems that use large
amounts of memory or disks by increasing the value
of the
volinfo.max_io
kernel variable.
See
Section 4.4.6
for information about using
dbx
to display and modify kernel variables.
Hardware RAID subsystems provide RAID functionality for high performance and high availability, relieve the CPU of disk I/O overhead, and enable you to connect many disks to a single I/O bus. There are various types of hardware RAID subsystems with different performance and availability features, but they all include a RAID controller, disks in enclosures, cabling, and disk management software.
RAID storage solutions range from low-cost backplane RAID array controllers to cluster-capable RAID array controllers that provide extensive performance and availability features, such as write-back caches and complete component redundancy.
Hardware RAID subsystems use disk management software, such as the RAID Configuration Utility (RCU) and the StorageWorks Command Console (SWCC) utility, to manage the RAID devices. Menu-driven interfaces allow you to select RAID levels.
Use hardware RAID to combine multiple disks into a single storage set that the system sees as a single unit. A storage set can consist of a simple set of disks, a striped set, a mirrored set, or a RAID set. You can create LSM volumes, AdvFS file domains, or UFS file systems on a storage set, or you can use the storage set as a raw device.
All hardware RAID subsystems provide you with the following features:
A RAID controller that relieves the CPU of the disk I/O overhead
Increased disk storage capacity
Hardware RAID subsystems allow you to connect a large number of disks to a single I/O bus. In a typical storage configuration, you use a SCSI bus connected to an I/O bus slot to attach disks to a system. However, systems have limited I/O bus slots, and you can connect only a limited number of disks to a SCSI bus (eight for SCSI-2 and sixteen for SCSI-3).
Hardware RAID subsystems contain multiple internal SCSI buses and host bus adapters, and require only one I/O bus to connect the subsystem to a system.
Read cache
A read cache improves I/O read performance by holding data that it anticipates the host will request. If a system requests data that is already in the read cache (a cache hit), the data is immediately supplied without having to read the data from disk. Subsequent data modifications are written both to disk and to the read cache (write-through caching).
Write-back cache
Hardware RAID subsystems support (as a standard or an optional feature) a write-back cache, which can improve I/O write performance while maintaining data integrity. A write-back cache decreases the latency of many small writes, and can improve Internet server performance because writes appear to be executed immediately. Applications that perform few writes will not benefit from a write-back cache.
With write-back caching, data intended to be written to disk is temporarily stored in the cache, consolidated, and then periodically written (flushed) to disk for maximum efficiency. I/O latency is reduced by consolidating contiguous data blocks from multiple host writes into a single unit.
A write-back cache must be battery-backed to protect against data loss and corruption.
RAID support
All hardware RAID subsystems support RAID 0 (disk striping), RAID 1 (disk mirroring), and RAID 5. High-performance RAID array subsystems also support RAID 3 and dynamic parity RAID. See Section 1.3.3.1 for information about RAID levels.
Non-RAID disk array capability or "just a bunch of disks" (JBOD)
Component hot swapping and hot sparing
Hot swap support allows you to replace a failed component while the system continues to operate. Hot spare support allows you to automatically use previously installed components if a failure occurs.
Graphical user interface (GUI) for easy management and monitoring
There are various types of hardware RAID subsystems, which provide different degrees of performance and availability at various costs:
Backplane RAID array storage subsystems
These entry-level subsystems, such as those utilizing the RAID Array 230/Plus storage controller, provide a low-cost hardware RAID solution and are designed for small and midsize departments and workgroups.
A backplane RAID array storage controller is installed in an I/O bus slot, either a PCI bus slot or an EISA bus slot, and acts as both a host bus adapter and a RAID controller.
Backplane RAID array subsystems provide RAID functionality (0, 1, 0+1, and 5), an optional write-back cache, and hot swap functionality.
High-performance RAID array subsystems
These subsystems, such as the RAID Array 450 subsystem, provide extensive performance and availability features and are designed for client/server, data center, and medium to large departmental environments.
A high-performance RAID array controller, such as an HSZ50 controller, is connected to a system through a FWD SCSI bus and a high-performance host bus adapter installed in an I/O bus slot.
High-performance RAID array subsystems provide RAID functionality (0, 1, 0+1, 3, 5, and dynamic parity RAID), dual-redundant controller support, scalability, storage set partitioning, multipath concurrent access, a standard battery-backed write-back cache, and hot-swappable components.
Enterprise Storage Arrays (ESA)
These preconfigured high-performance hardware RAID subsystems, such as the RAID Array 7000, provide the highest performance, availability, and disk capacity of any RAID subsystem. They are used for high transaction-oriented applications and high bandwidth decision-support applications.
ESAs support all major RAID levels, including dynamic parity RAID; fully redundant and hot-swappable components; a standard battery-backed write-back cache; and centralized storage management.
See the Compaq Systems & Options Catalog for detailed information about hardware RAID subsystem features.
Table 8-8 describes the hardware RAID subsystem configuration recommendations and lists performance benefits as well as tradeoffs.
| Recommendation | Performance Benefit | Tradeoff |
| Evenly distribute disks in a storage set across different buses (Section 8.4.1) | Improves performance and helps to prevent bottlenecks | None |
| Configure devices for multipath concurrent access | Improves throughput and increases availability | Cost of devices that support this feature |
| Use disks with the same data capacity in each storage set (Section 8.4.2) | Simplifies storage management | None |
| Use an appropriate stripe size (Section 8.4.3) | Improves performance | None |
| Mirror striped sets (Section 8.4.4) | Provides availability and distributes disk I/O performance | Increases configuration complexity and may decrease write performance |
| Use a write-back cache (Section 8.4.5) | Improves write performance | Cost of hardware |
| Use dual-redundant RAID controllers (Section 8.4.6) | Improves performance, increases availability, and prevents I/O bus bottlenecks | Cost of hardware |
| Install spare disks (Section 8.4.7) | Improves availability | Cost of disks |
| Replace failed disks promptly (Section 8.4.7) | Improves performance | None |
The following sections describe some of these guidelines. See your RAID subsystem documentation for detailed configuration information.
You can improve performance and help to prevent bottlenecks by distributing storage set disks evenly across different buses.
In addition, make sure that the first member of each mirrored set is on a different bus.
Use disks with the same capacity in the same storage set. This simplifies storage management.
You must understand how your applications perform disk I/O before you can choose the stripe (chunk) size that will provide the best performance benefit. See Section 2.1 for information about determining a resource model for your system.
If the stripe size is large compared to the average I/O size, each disk in a stripe set can respond to a separate data transfer. I/O operations can then be handled in parallel, which increases sequential write performance and throughput. This can improve performance for environments that perform large numbers of I/O operations, including transaction processing, office automation, and file services environments, and for environments that perform multiple random read and write operations.
If the stripe size is smaller than the average I/O operation, multiple disks can simultaneously handle a single I/O operation, which can increase bandwidth and improve sequential file processing. This is beneficial for image processing and data collection environments. However, do not make the stripe size so small that it will degrade performance for large sequential data transfers.
For example, if you use an 8-KB stripe size, small data transfers will be distributed evenly across the member disks, but a 64-KB data transfer will be divided into at least eight data transfers.
If your applications are doing I/O to a raw device and not a file system, use a stripe size that distributes a single data transfer evenly across the member disks. For example, if the typical I/O size is 1 MB and you have a four-disk array, you could use a 256-KB stripe size. This would distribute the data evenly among the four member disks, with each doing a single 256-KB data transfer in parallel.
For small file system I/O operations, use a stripe size that is a multiple of the typical I/O size (for example, four to five times the I/O size). This will help to ensure that the I/O is not split across disks.
You may want to choose a stripe size that will prevent any particular range of blocks from becoming a bottleneck. For example, if an application often uses a particular 8-KB block, you may want to use a stripe size that is slightly larger or smaller than 8 KB or is a multiple of 8 KB to force the data onto a different disk.
Striped disks improve I/O performance by distributing the disk I/O load. Mirroring striped disks provides high availability, but can decrease write performance, because each write operation results in two disk writes.
RAID subsystems support, either as a standard or an optional feature, a nonvolatile (battery-backed) write-back cache that can improve disk I/O performance while maintaining data integrity. A write-back cache improves performance for systems that perform large numbers of writes. Applications that perform few writes will not benefit from a write-back cache.
With write-back caching, data intended to be written to disk is temporarily stored in the cache and then periodically written (flushed) to disk for maximum efficiency. I/O latency is reduced by consolidating contiguous data blocks from multiple host writes into a single unit.
A write-back cache improves performance because writes appear to be executed immediately. If a failure occurs, upon recovery the RAID controller detects any unwritten data that still exists in the write-back cache and writes the data to disk before enabling normal controller operations.
A write-back cache must be battery-backed to protect against data loss and corruption.
If you are using an HSZ40 or HSZ50 RAID controller with a write-back cache, the following guidelines may improve performance:
Set
CACHE_POLICY
to B.
Set
CACHE_FLUSH_TIMER
to a minimum of 45
(seconds).
Enable the write-back cache (WRITEBACK_CACHE)
for each unit, and set the value of
MAXIMUM_CACHED_TRANSFER_SIZE
to a minimum of 256.
See the RAID subsystem documentation for more information about using the write-back cache.
If supported by your RAID subsystem, you can use a dual-redundant controller configuration and balance the number of disks across the two controllers. This can improve performance, increase availability, and prevent I/O bus bottlenecks.
Install predesignated spare disks on separate controller ports and storage shelves. This will help you to maintain data availability and recover quickly if a disk failure occurs.
The Common Access Method (CAM) is the operating system interface to the hardware. CAM maintains pools of buffers that are used to perform I/O. Each buffer takes approximately 1 KB of physical memory. Monitor these pools (see Section 8.2.2) and tune them if necessary.
You can modify the following
io
subsystem attributes
to improve CAM performance:
cam_ccb_pool_size--The initial size
of the buffer pool free list at boot time.
The default is 200.
cam_ccb_low_water--The number of
buffers in the pool free list at which more buffers are allocated from the
kernel.
CAM reserves this number of buffers to ensure that the kernel always
has enough memory to shut down runaway processes.
The default is 100.
cam_ccb_increment--The number of
buffers either added or removed from the buffer pool free list.
Buffers are
allocated on an as-needed basis to handle immediate demands, but are released
in a more measured manner to guard against spikes.
The default is 50.
If the I/O
pattern associated with your system tends to have intermittent bursts of I/O
operations (I/O spikes), increasing the values of the
cam_ccb_pool_size
and
cam_ccb_increment
attributes may improve
performance.