7    LSM Preventative Maintenance Procedures

LSM-related preventative maintenance procedures enable you to restore your LSM configuration if a disk or system fails. Preventative maintenance procedures include using the redundancy features built into LSM, backing up your system regularly, backing up copies of critical data needed to reconstruct your LSM configuration, and monitoring the LSM software.

This chapter describes LSM-related preventative maintenance procedures that you should perform.

7.1    The LSM Hot-Sparing Feature

The LSM software supports hot-sparing that enables you to configure a system to automatically react to failures on mirrored or RAID5 LSM objects.

With hot-sparing enabled, LSM detects failures on LSM objects and relocates the affected subdisks to designated spare disks or free disk space within the disk group. LSM then reconstructs the LSM objects that existed before the failure and makes them redundant and accessible.

When a partial disk failure occurs (that is, a failure affecting only some subdisks on a disk), redundant data on the failed portion of the disk is relocated and the existing volumes comprised of the unaffected portions of the disk remain accessible.

By default, hot-sparing is disabled. To enable hot-sparing enter the following command:

# volwatch -s

Note

Only one volwatch daemon can run on a system or cluster node at any time.

7.1.1    Hot-Spare Overview

The volwatch daemon monitors for failures involving LSM disks, plexes, or RAID5 subdisks. When such a failure occurs, volwatch daemon:

Note

Hot-sparing is only performed for redundant (mirrored or RAID5) subdisks on a failed disk. Non-redundant subdisks on a failed disk are not relocated, but you are notified of the failure.

Hot-sparing does not guarantee the same layout of data or the same performance after relocation. You may want to make some configuration changes after hot-sparing occurs.

7.1.1.1    Mail Notification

When an exception event occurs, the volwatch command uses mailx(1) to send mail to:

There is a 15 second delay before the failure is analyzed and the message is sent. This delay allows a group of related events to be collected and reported in a single mail message. The following is a sample mail notification when a failure is detected:

Failures have been detected by the Logical Storage Manager:  
 
failed disks:
 
medianame
 ...
 
failed plexes:
 
plexname
 
 ...
 
failed log plexes:
 
plexname
 
 ...
 
failing disks:
 
medianame
 ...
 
failed subdisks:
 
subdiskname
 
 ...
 
The Logical Storage Manager will attempt to find spare disks, 
relocate failed subdisks and then recover the data in the failed plexes.

The following list describes the sections of the mail message:

If a disk fails completely, the mail message lists the disk that has failed and all plexes that use the disk. For example:

To: root
Subject: Logical Storage Manager failures on mobius.lsm.com
 
Failures have been detected by the Logical Storage Manager
 
failed disks:
 disk02
 
failed plexes:
  home-02
  src-02
  mkting-01
 
failing disks:
  disk02
 
 

This message shows that disk02 was detached by a failure, and that plexes home-02, src-02, and mkting-01 were also detached (probably due to a disk failure).

If plex or disk is detached by a failure, the mail sent lists the failed objects. If a partial disk failure occurs, the mail identifies the failed plexes. For example, if a disk containing mirrored volumes experiences a failure, a mail message similar to the following is sent:

To: root
Subject: Logical Storage Manager failures on mobius.lsm.com
 
Failures have been detected by the Logical Storage Manager:
 
failed plexes:
  home-02
  src-02
 
 

To determine which disks are causing the failures in this message, enter:

# volstat -sff home-02 src-02

This produces output such as the following:

                            FAILED
TYP NAME                READS    WRITES
sd  disk01-04               0         0
sd  disk01-06               0         0
sd  disk02-03               1         0
sd  disk02-04               1         0

This display indicates that the failures are on disk02 and that subdisks disk02-03 and disk02-04 are affected.

Hot-sparing automatically relocates the affected subdisks and initiates necessary recovery procedures. However, if relocation is not possible or the hot-sparing feature is disabled, you must investigate the problem and recover the plexes. For example, sometimes these errors are caused by cabling failures. Check at the cables connecting your disks to your system. If there are any obvious problems, correct them and recover the plexes with the following command:

# volrecover -b volhome volsrc

This command starts a recovery of the failed plexes in the background (the command returns before the operation is done). If an error message appears, or if the plexes become detached again, you must replace the disk.

7.1.2    Initailizing Spare Disks

To use hot-sparing, you should configure disks as a spare, which identifies the disk as an available site for relocating failed subdisks. The LSM software does not use disks that are identified as spares for normal allocations unless you explicitly specify otherwise. This ensures that there is a pool of spare disk space available for relocating failed subdisks.

Spare disk space is the first space used to relocate failed subdisks. However, if no spare disk space is available or if the available spare disk space is not suitable or sufficient, free disk space is used.

You must initialize a spare disk and placed it in a disk group as a spare before it can be used for replacement purposes. If no disks are designated as spares when a failure occurs, LSM automatically uses any available free disk space in the disk group in which the failure occurs. If there is not enough spare disk space, a combination of spare disk space and free disk space is used.

When hot-sparing selects a disk for relocation, it preserves the redundancy characteristics of the LSM object to which the relocated subdisk belongs. For example, hot-sparing ensures that subdisks from a failed plex are not relocated to a disk containing a mirror of the failed plex. If redundancy cannot be preserved using available spare disks and/or free disk space, hot-sparing does not take place. If relocation is not possible, mail is sent indicating that no action was taken.

When hot-sparing takes place, the failed subdisk is removed from the configuration database and LSM takes precautions to ensure that the disk space used by the failed subdisk is not recycled as free disk space.

Follow these guidelines when choosing a disk to configuring as a spare:

To initialize a disk as a spare that has no associated subdisks, use the voldiskadd command and enter y at the following prompt:

Add disk as a spare disk for newdg? [y,n,q,?] (default: n) y

To initialize an existing LSM disk as a spare disk, enter:

# voledit set spare=on medianame

For example, to initialize a disk called test03 as a spare disk, enter:

# voledit set spare=on test03

To remove a disk as a spare, enter:

# voledit set spare=off medianame

For example, to make a disk called test03 available for normal use, enter:

# voledit set spare=off test03

7.2    Replacement Procedure

In the event of a disk failure, mail is sent, and if volwatch was configured to run with hot sparing support with the -s option, volwatch attempts to relocate any subdisks that appear to have failed. This involves finding appropriate spare disk or free disk space in the same disk group as the failed subdisk.

To determine which disk from among the eligible spare disks to use, volwatch tries to use the disk that is closest to the failed disk. The value of closeness depends on the controller, target, and disk number of the failed disk. For example, a disk on the same controller as the failed disk is closer than a disk on a different controller; a disk under the same target as the failed disk is closer than one under a different target.

If no spare or free disk space is found, the following mail message is sent explaining the disposition of volumes on the failed disk:

Relocation was not successful for subdisks on disk dm_name
in volume v_name in disk group dg_name.  
No replacement was made and the disk is still unusable. 
 
The following volumes have storage on medianame:
 
volumename
...
 
These volumes are still usable, but the redundancy of 
those volumes is reduced. Any RAID5 volumes with storage 
on the failed disk may become unusable in the face of further 
failures.

If non-RAID5 volumes are made unusable due to the failure of the disk, the following is included in the mail message:

The following volumes:
 
volumename
...
 
have data on medianame but have no other usable 
mirrors on other disks. These volumes are now unusable 
and the data on them is unavailable.  These volumes must 
have their data restored.

If RAID5 volumes are made unavailable due to the disk failure, the following message is included in the mail message:

The following RAID5 volumes:
 
volumename
...
 
have storage on medianame and have experienced 
other failures. These RAID5 volumes are now unusable 
and data on them is unavailable.  These RAID5 volumes must 
have their data restored.

If spare disk space is found, LSM attempts to set up a subdisk on the spare disk and use it to replace the failed subdisk. If this is successful, the volrecover command runs in the background to recover the contents of data in volumes on the failed disk.

If the relocation fails, the following mail message is sent:

Relocation was not successful for subdisks on disk dm_name in
volume v_name in disk group dg_name. 
No replacement was made and the disk is still unusable.
 
error message

If any volumes are rendered unusable due to the failure, the following is included in the mail message:

The following volumes:
 
volumename
...
 
have data on dm_name but have no other usable mirrors 
on other disks. These volumes are now unusable and the data on them is 
unavailable. These volumes must have their data restored.

If the relocation procedure completes successfully and recovery is under way, the following mail message is sent:

Volume v_name Subdisk sd_name relocated to newsd_name, 
but not yet recovered.

Once recovery has completed, a message is sent relaying the outcome of the recovery procedure. If the recovery was successful, the following is included in the mail message:

Recovery complete for volume v_name in disk group dg_name.

If the recovery was not successful, the following is included in the mail message:

Failure recovering v_name in disk group dg_name.

7.2.1    Moving Relocated Subdisks

When hot-sparing occurs, subdisks are relocated to spare disks or available free disk space within the disk group. The new subdisk locations may not provide the same performance or data layout that existed before hot-sparing took place. After a hot-spare procedureis completed, you may want to move the relocated subdisks to improve performance, to keep the spare disk space free for future hot-spare needs, or to restore the configuration to its previous state.

7.3    Save the LSM Configuration

It is recommended that you use the volsave command to create a copy the current LSM configuration on a regular basis. You can use the volrestore command to recreate the configuration if the disk group's configuration is lost.

The volsave command only saves information about the LSM configuration; it does not save:

The volsave command saves information about an LSM configuration in a set of files called a description set, which is stored by default in a time stamped directory in /usr/var/lsm/db.

When you run volsave, a description set is created, which consists of the following files:

To create a backup copy of the current LSM configuration using the default backup directory and verify the LSM configuration information in the default directory, enter:

# volsave

Output similar to the following is displayed:

LSM configuration being saved to
   /usr/var/lsm/db/LSM.19991226203620.skylark
 
volsave does not save configuration for volumes used for
root, swap, /usr or /var.
LSM configuration for following system disks not saved:
dsk8a dsk8b
 
LSM Configuration saved successfully.
 
 
 

To verify the save, enter:

# cd /usr/var/lsm/db/LSM.19991226203620.skylark

# ls

Output similar to the following is displayed:

dg1.d     header  volboot dg2.d    rootdg.d  voldisk.list:

In this example, the volsave command created the following files and directories:

To save the LSM configuration in a timestamped subdirectory in a directory other than /usr/var/config, enter:

# volsave -d /usr/var/config/LSM.%date

Output similar to the following is displayed:

LSM configuration being saved to
    /usr/var/config/LSM.19991226203658

.
.
.
LSM Configuration saved successfully.

To save an LSM configuration to a specific directory called /usr/var/LSM.config1, enter:

# volsave -d /usr/var/LSM.config1

Output similar to the following is displayed:

   LSM configuration being saved to /usr/var/LSM.config1

.
.
.
LSM Configuration saved successfully.

7.4    Back Up Volumes

Backing up a volume requires a mirrored plex that is at least large enough to store the complete contents of the volume. Using a smaller plex results in an incomplete copy.

Note

Use the AdvFS backup utilities to backup volumes used with AdvFS instead of the LSM methods.

The methods described in this section do not apply to RAID5 volumes.

7.4.1    Backing Up A Non-Mirrored Volume

Follow these steps to backup a non-mirrored volume:

  1. Ensure there is enough free disk space to create another plex for the volume that you want to back up. You can determine this by comparing the output of the voldg free command for the disk group and the volprint -vt command for the volume.

  2. Create a snapshot plex by entering the following command:

    # volassist snapstart volume_name

    For example, to create a snapshot plex for the volume called vol01, enter:

    # volassist snapstart vol01

    This creates a write-only backup plex that is attached to and synchronized with the specified volume.

  3. When the snapstart operation is complete, the plex state changes to SNAPDONE. You can then complete the snapshot operation. Select a convenient time and inform users of the upcoming snapshot. Warn them to save files and refrain from using the system briefly during that time.

    When you are ready to create the snapshot, make sure there is no activity on the volume. For UFS volumes, It is recommended that you unmount the file system briefly to ensure that the snapshot data on disk is consistent and complete (all cached data has been flushed to the disk).

  4. Create a snapshot volume that reflects the original volume by entering the following command:

    # volassist snapshot volume_name temp_volume name

    For example to create temporary volume called vol01-temp for a volume called vol01, enter:

    # volassist snapshot vol01 vol01-temp

    This operation detaches the finished snapshot (which becomes a normal plex), creates a new normal volume, and attaches the snapshot plex to it. The snapshot then becomes a normal, functioning plex with a state of ACTIVE. At this point, you can mount and resume normal use of the volume.

  5. Check the temporary volume's contents. For example, to check the UFS file system on a volume called vol01-temp, enter:

    # fsck -p /dev/rvol/rootdg/vol01-temp

  6. Perform the backup by entering the following command:

    # dump 0 /dev/rvol/disk-group/temp_volume_name

    For example, to back up a volume called vol01-temp in the rootdg disk group, enter:

    # dump 0 /dev/rvol/rootdg/vol01-temp

  7. After the backup is completed, remove the temporary volume by entering the following commands:

    # volume stop temp_volume_name 
    # voledit -r rm temp_volume_name
    

    For example to remove a temporary volume called vol01-temp, enter:

    # volume stop vol01-temp 
    # volume -r rm vol01-temp
    

7.4.2    Backing Up A Mirrored Volume

If a volume is mirrored, you can back up the volume by temporarily dissociating one of the plexes from the volume. This method eliminates the need for extra disk space for the purpose of backup only.

Warning

If the volume has only two plexes, redundancy is not available during the time of the backup.

Follow these steps to back up a mirrored volume:

  1. Stop all activity on the volume. For UFS volumes, it is recommended that you unmount the file system briefly to ensure that the data on disk is consistent and complete (all cached data has been flushed to the disk).

  2. Dissociate one of the volume's plexes by entering the following command:

    # volplex dis plex_name

    For example, to dissociate a plex called vol01-02, enter:

    # volplex dis vol01-02

    This operation usually takes only a few seconds. It leaves the plex called vol01-02 available as an image of the volume frozen at the time of dissociation.

  3. Mount and resume normal use of the volume.

  4. Create a temporary volume by entering the following commands:

    # volmake -Ufsgen vol temp_volume_name plex=plex_name 
    # volume start temp_volume_name
    

    For example to create a temporary volume called vol01-temp using a plex called vol01-02, enter:

    # volmake -Ufsgen vol vol01-temp plex=vol01-02 
    # volume start vol01-temp
    

  5. Check the temporary volume by entering the following command:

    # fsck -p /dev/rvol/rootdg/temp_volume_name

  6. Perform the backup using the temporary volume by entering the following command:

    # dump 0 /dev/rvol/rootdg/temp_volume_name

  7. After the backup is completed, remove the temporary volume and reattach the plex by entering the following commands:

    # volplex dis plex_name 
    # voledit -r rm temp_volume_name 
    # volplex att volume_name plex_name &
    

7.5    Collect LSM Performance Data

LSM provides two types of performance information -- I/O statistics and I/O traces:

Note

In a cluster environment, volstat and voltrace report statistics relative to the node on which the commands are executed. These commands do not provide statistics for all the nodes within a cluster.

7.5.1    Gathering I/O Statistics

The volstat command provides access to information for activity on volumes, plexes, subdisks, and disks uses with the LSM software. You can use the volstat command to report I/O statistics for LSM objects during system boot time or for specified time intervals. Statistics for a specific LSM object or all objects are displayed. If a disk group is specified, statistics are displayed only for objects in that disk group; otherwise, statistics for the default disk group (rootdg) are displayed.

The amount of information displayed depends on the options you specified with the volstat command. You can also reset the statistics information to zero, which is useful for measuring the impact of a particular operation. For information on available options, see the volstat(8) reference page.

7.5.1.1    Statistics Recorded by LSM

The LSM software records the following three I/O statistics:

The LSM software records these statistics for logical I/O for each volume. The statistics are recorded for the following types of operations: reads, writes, atomic copies, verified reads, verified writes, plex reads, and plex writes.

For example, one write to a two-plex volume results in at least five operations -- one for each plex, one for each subdisk, and one for the volume. Similarly, one read that spans two subdisks shows at least four reads -- one read for each subdisk, one for the plex, and one for the volume.

The LSM software also maintains other statistical data. For example, read and write failures that appear for each mirror, and corrected read and write failures for each volume accompany the read and write failures that are recorded.

To display statical data for volumes, enter:

# volstat

Output similar to the following is displayed:


 
OPERATIONS  BLOCKS        AVG TIME(ms)
TYP NAME     READ     WRITE      READ     WRITE   READ  WRITE 
vol v1          3        72        40     62541    8.9   56.5 
vol v2          0        37         0       592    0.0   10.5 

 
 

To display statical data for all LSM objects, enter:

# volstat -vpsd

Output similar to the following is displayed:

OPERATIONS   BLOCKS        AVG TIME(ms)
TYP NAME      READ     WRITE      READ     WRITE   READ  WRITE 
dm  dsk6        3        82        40     62561    8.9   51.2 
dm  dsk7        0       725         0    176464    0.0   16.3 
dm  dsk9      688        37    175872       592    3.9    9.2 
dm  dsk10   29962         0   7670016         0    4.0    0.0 
dm  dsk12       0     29962         0   7670016    0.0   17.8 
vol v1          3        72        40     62541    8.9   56.5 
pl  v1-01       3        72        40     62541    8.9   56.5 
sd  dsk6-01     3        72        40     62541    8.9   56.5 
vol v2          0        37         0       592    0.0   10.5 
pl  v2-01       0        37         0       592    0.0    8.0 
sd  dsk7-01     0        37         0       592    0.0    8.0 
sd  dsk12-01    0         0         0         0    0.0    0.0 
pl  v2-02       0        37         0       592    0.0    9.2 
sd  dsk9-01     0        37         0       592    0.0    9.2 
sd  dsk10-01    0         0         0         0    0.0    0.0 
pl  v2-03       0         6         0        12    0.0   13.3 
sd  dsk6-02     0         6         0        12    0.0   13.3 
 
 
 

To display statistics on volumes in the rootdg disk group in one second intervals, enter:

# volstat -i 1

Output similar to the following is displayed:

                        
OPERATIONS     BLOCKS        AVG TIME(ms)
TYP NAME        READ     WRITE      READ     WRITE   READ  WRITE 
 
Mon Jun  8 15:11:16 1998
vol v1             3        72        40     62541    8.9   56.5 
vol v2         14015        37     14015       592    0.3   10.5 
 
Mon Jun  8 15:11:17 1998
vol v1             0         0         0         0    0.0    0.0 
vol v2          2606         0      2606         0    0.3    0.0 

To display error statistics on a volume called testvol, enter:

# volstat -f cf testvol

Output similar to the following is displayed:


 
CORRECTED        FAILED
TYP NAME        READS   WRITES    READS   WRITES
vol testvol         1        0        0        0

Additional volume statistics are available for RAID5 configurations. See the volstat(8) reference page for details.

7.6    Monitor LSM Events and Configuration Changes

You use the volwatch and the volnotify commands to monitor LSM events and configuration changes.

The volwatch shell script sends mail to the root login (default) and other user accounts that you specify when certain LSM configuration events occur, such as a plex detach caused by a disk failure.

To specify another mail recipient or multiple mail recipients, use the rcmgr command to set the rc.config.common variable VOLWATCH_USERS. The LSM volwatch script uses VOLWATCH_USERS whenever the system is booted or LSM is restarted.

To specify a user named user1@mail.com as a mail recipient, enter:

# rcmgr -c set VOLWATCH_USERS root@dec.com user1@mail.com

LSM events are sent to the EVM Event Management system using volnotify command. The volnotify command integrates with EVM by default, and runs automatically when LSM starts.

The following LSM events are reported by the volnotify command within EVM:

While the LSM volnotify events reported within EVM are configured through the rc.config.common variable LSM_EVM_OPTS, the LSM_EVM_OPTS settings should not normally be changed because certain system software depend on these values for operation.

Note

For cluster environments, the volnotify command only reports LSM events that occur locally on that node. Therefore, to receive LSM events that occur anywhere within the cluster, do not disable the volnotify integration within EVM.

Subscribers can display LSM events through the LSM volnotify EVM template called lsm.volnotify.evt. This EVM template is used to display LSM events in cluster and non-cluster environments.

To display LSM events from the EVM log, enter:

# evmget -f "[name *.volnotify]" | evmshow -t \ 
"@timestamp @@"

To display LSM events in real time, enter:

# evmwatch -f "[name *.volnotify]" | evmshow -t \ 
"@timestamp @@"

For more information, see the volnotify(8), volwatch(8), and EVM(5) reference pages.

7.7    Monitor Volume States

You can use the volprint command to display volume information. The volprint command displays state information that indicates a variety of normal and exceptional conditions.

7.7.1    Volume States

Volume states indicate whether or not the volume is initialized, written to, and the accessibility of the volume. Table 7-1 describes the volume states.

Table 7-1:  Volume States

State Means
EMPTY The volume contents are not initialized. The kernel state is always DISABLED when the volume is EMPTY.
CLEAN The volume is not started (kernel state is DISABLED) and its plexes are synchronized.
ACTIVE The volume was started (kernel state is currently ENABLED) or was in use (kernel state was ENABLED) when the machine was rebooted. If the volume is currently ENABLED, the state of its RAID-5 plex at any moment is not certain (since the volume is in use). If the volume is currently DISABLED, this means that the plexes cannot be guaranteed to be consistent, but are made consistent when the volume is started.
SYNC The volume is either in read-writeback recovery mode (kernel state is currently ENABLED) or was in read-writeback mode when the machine was rebooted (kernel state is DISABLED). With read-writeback recovery, plex consistency is recovered by reading data from blocks of one plex and writing the data to all other writable plexes. If the volume is ENABLED, this means that the plexes are being resynchronized via the read-writeback recovery. If the volume is DISABLED, it means that the plexes were being resynchronized via read-writeback when the machine rebooted and therefore, still need to be synchronized.
NEEDSYNC The volume requires a resynchronization operation the next time it starts.

The interpretation of these states during volume startup is modified by the persistent state log for the volume (for example, the DIRTY/CLEAN flag). If the clean flag is set, this means that an ACTIVE volume was not written to by any processes or was not even open at the time of the reboot; therefore, it is considered CLEAN. The clean flag is always set when the volume is marked CLEAN.

7.7.2    RAID5 Volume States

RAID5 volumes have their own set of volume states as described in Table 7-2.

Table 7-2:  RAID5 Volume States

State Means
EMPTY The volume contents are not initialized. The kernel state is always DISABLED when the volume is EMPTY.
CLEAN The volume is not started (kernel state is DISABLED) and its parity is good. The RAID-5 plex stripes are consistent.
ACTIVE The volume was started (kernel state is currently ENABLED) or was in use (kernel state was ENABLED) when the system was rebooted. If the volume is currently ENABLED, the state of its RAID-5 plex at any moment is not certain (since the volume is in use). If the volume is currently DISABLED, this means that the parity synchronization is not guaranteed.
SYNC The volume is either undergoing a parity resynchronization (kernel state is currently ENABLED) or was having its parity resynchronized when the machine was rebooted (kernel state is DISABLED).
NEEDSYNC The volume requires a parity resynchronization operation the next time it is started.
REPLAY The volume is in a transient state as part of a log replay. A log replay occurs when it becomes necessary to use logged parity and data.

7.7.3    Volume Kernel States

The volume kernel state indicates the accessibility of the volume. The volume kernel state allows a volume to have an offline (DISABLED), maintenance (DETACHED), or online (ENABLED) mode of operation. Table 7-3 describes the volume kernel states.

Table 7-3:  Volume Kernel States

State Means
ENABLED Read and write operations can be performed.
DISABLED The volume is not accessed.
DETACHED Read and write operations cannot be performed, but plex device operations and ioctl functions are accepted.

7.8    Monitor Plex States

You can use the volprint command to display plex information. The volprint command displays state information that indicates a variety of normal and exceptional conditions.

7.8.1    Plex States

Each data plex associated with a volume has its state set to one of the values listed in Table 7-4.

LSM utilities use plex states to:

Although the LSM utilities automatically maintain a plex's state, you may need to manually change the plex state. For example, if a disk begins to fail, you can temporarily disable a plex located on the disk.

Table 7-4:  LSM Plex States

State Means
EMPTY The plex is not initialized. This state is set when the volume state is also EMPTY.
CLEAN The plex was running normally when the volume was stopped. The plex was enabled without requiring recovery when the volume is started.
ACTIVE The plex is running normally on a started volume. The plex condition flags (NODAREC, REMOVED, RECOVER, and IOFAIL) may apply if the system is rebooted and the volume restarted.
STALE The plex was detached, either by the volplex det command or by an I/O failure. The volume start command changes the state for a plex to STALE if any of the plex condition flags are set. STALE plexes are reattached automatically by calling volplex att when a volume starts.
OFFLINE The plex was disabled by the volmend off command. See volmend(8) for more information.
SNAPATT This is a snapshot plex that is attached by the volassist snapstart command. When the attach is complete, the state for the plex is changed to SNAPDONE. If the system fails before the attach completes, the plex and all of its subdisks are removed.
SNAPDONE This is a snapshot plex created by volassist snapstart command that fully attached. You can turn a plex in this state into a snapshot volume with the volassist snapshot command. If the system fails before the attach completes, the plex and all of its subdisks are removed. See volassist(8) for more information.
SNAPTMP This is a snapshot plex that was attached by the volplex snapstart command. When the attach is complete, the state for the plex changes to SNAPDIS. If the system fails before the attach completes, the plex is dissociated from the volume.
SNAPDIS This is a snapshot plex created by volplex snapstart command that is fully attached. You can turn a plex in this state into a snapshot volume with volplex snapshot command. If the system fails before the attach completes, the plex is dissociated from the volume. See volplex(8) for more information.
TEMP This is a plex that is associated and attached to a volume with the volplex att command. If the system fails before the attach completes the plex is dissociated from the volume.
TEMPRM This is a plex that is being associated and attached to a volume with volplex att. If the system fails before the attach completes the plex is dissociated from the volume and removed. Any subdisks in the plex are kept.
TEMPRMSD This is a plex that is associated and attached to a volume with the volplex att command. If the system fails before the attach completes, the plex and its subdisks are dissociated from the volume and removed.

7.8.2    Plex State Cycle

During normal LSM operation, plexes automatically cycle through a series of states. At system startup, volumes are automatically started and the volume start operation makes all CLEAN plexes ACTIVE. If all goes well until shutdown, the volume-stopping operation marks all ACTIVE plexes CLEAN and the cycle continues.

Deviations from this cycle indicates abnormalities that the LSM software attempts to normalize, for example:

7.8.3    Plex Kernel States

The plex kernel state indicates the accessibility of the plex. The plex kernel state is monitored in the volume driver and allows a plex to have an offline (DISABLED), maintenance (DETACHED), or online (ENABLED) mode of operation. No user intervention is required to set these states; they are maintained internally. On a system that is operating properly, all plexes are set to ENABLED.

Table 7-5 describes the plex kernel states.

Table 7-5:  Plex Kernel States

State Means
ENABLED A write request to the volume will be reflected to the plex, if the plex is set to ENABLED for write mode. A read request from the volume is satisfied from the plex if the plex is set to ENABLED.
DISABLED The plex is not accessed.
DETACHED A write to the volume is not reflected to the plex. A read request from the volume will never be satisfied from the plex device. Plex operations and ioctl functions are accepted.

7.9    Monitor LSM Daemons

The vold and voliod daemons must be running for the LSM software to properly work. These daemons are normally started automatically when the system boots.

To determine the state of the volume daemon, enter:

# voldctl mode

The following table shows messages that may display and possible actions to take if vold is disabled or not running.

Message from voldctl mode Status of vold How to change

mode:enabled

Running and enabled --

mode:disabled

Running, but disabled voldctl enable

mode:not-running

Not running vold

See the vold(8) reference page for more information on the vold daemon.

The volume extended I/O daemon (voliod) allows for some extended I/O operations without blocking calling processes. The correct number of voliod daemons is automatically started when LSM is started. There are typically several voliod daemons running at all times. It is recommended that you run at least one voliod daemon for each processor on the system.

Follow these steps to check and/or change the voliod daemons:

  1. Display the current voliod state by entering the following command:

    # voliod

    This is the only method for checking on voliod, because the voliod processes are kernel threads and are not visible as output of the ps command.

    Output similar to the following may display:

    0 volume I/O daemons running

  2. If no voliod daemons are running, or if you want to change the number of daemons, enter the following command:

    # voliod set n

    Where n is the number of I/O daemons. Set the number of LSM I/O daemons, which is either two or the number of central processing units (CPUs) on the system, whichever is greater. For example, on a single CPU system, enter:

    # voliod set 2

    On a four CPU system, enter:

    # voliod set 4

  3. Verify the change by entering the following command:

    # voliod

    Output similar to the following should display:

    2 volume I/O daemons running

See the voliod(8) reference page for more information on the voliod daemon.

7.10    Trace LSM I/O Operations

You use the voltrace command trace volume operations. Using the voltrace command, you can set I/O tracing masks against a group of volumes or to the system as a whole. You can then use the voltrace command to display ongoing I/O operations relative to the masks.

The trace records for each physical I/O show a volume and buffer-pointer combination that enables you to track each operation even though the traces may be interspersed with other operations. Like the I/O statistics for a volume, the I/O trace statistics include records for each physical I/O done, and a logical record that summarizes all physical records. For additional information, see the voltrace(8) reference page.

To trace volumes, enter:

# voltrace -l

Output similar to the following is displayed:

926159 598519126 START read vdev v2 dg rootdg dev 40,6 block 895389 len 1 concurrency 1 pid 3943
926159 598519127 END read vdev v2 dg rootdg op 926159 block 895389 len 1 time 1
926160 598519128 START read vdev v2 dg rootdg dev 40,6 block 895390 len 1 concurrency 1 pid 3943
926160 598519128 END read vdev v2 dg rootdg op 926160 block 895390 len 1 time 0