volwatch(8)

Index for
Section 8
Alphabetical
listing for V
Bottom of
page
volwatch(8)
NAME
  volwatch - Monitors the Logical Storage Manager (LSM) for failure events
  and performs hot sparing

SYNOPSIS
  /usr/sbin/volwatch [-m] [-s] [-o] [mail-addresses...]

OPTIONS
  -m  Runs volwatch with the mail notification support to notify root (by
      default) or other specified users when a failure occurs. This option is
      started by default.

  -s  Runs volwatch with hot spare support.

  -o volrecover_arg
      Specifies an argument to pass directly to volrecover if it is running
      and hot spare support is enabled.

DESCRIPTION
  The volwatch command monitors LSM waiting for exception events to occur.
  When an exception event occurs, the volwatch command uses mailx(1) to send
  mail to:

    ·  The root account.

    ·  The user accounts specified when you use the rcmgr command to set the
       VOLWATCH_USERS variable in the /etc/rc.config.common file.

    ·  The user account that you specify on the command line with the
       volwatch command.

  The volwatch command uses the volnotify command to wait for events to
  occur. When an event occurs,	there is a 15 second delay before the failure
  is analyzed and the message is sent.	This delay allows a group of related
  events to be collected and reported in a single mail message. By default,
  the volwatch command automatically starts when the system boots.

  You can enter the volwatch -s command to start the volwatch command with
  hot-spare support.  Hot-spare support:

    ·  Detects LSM events resulting from the failure of a disk, plex, or
       RAID5 subdisk.

    ·  Sends mail to the root account (and other specified accounts) with
       notification about the failure and identifies the affected LSM
       objects.

    ·  Determines which subdisks to relocate, finds space for those subdisks
       in the disk group, relocates the subdisks, and notifies the root
       account	(and other specified accounts) of these actions and their
       success or failure.

       When a partial disk failure occurs (that is, a failure affecting only
       some subdisks on a disk), redundant data on the failed portion of the
       disk is relocated and the existing volumes comprised of the unaffected
       portions of the disk remain accessible.

				     Note

       Hot-sparing is only performed for redundant (mirrored or RAID5)
       subdisks on a failed disk. Non-redundant subdisks on a failed disk are
       not relocated, but you are notified of the failure.

       Only one volwatch daemon can be running on a system or cluster node at
       any time.

       Hot-sparing does not guarantee the same layout of data or the same
       performance after relocation. You may want to make some configuration
       changes after hot-sparing occurs.

  Mail Notification Support

  The following is a sample mail notification when a failure is detected:

       Failures have been detected by the Logical Storage Manager:

       failed disks:

       medianame

	...

       failed plexes:

       plexname

	...

       failed log plexes:

       plexname

	...

       failing disks:

       medianame
	...

       failed subdisks:

       subdiskname

	...

       The Logical Storage Manager will attempt to find spare disks,
       relocate failed subdisks and then recover the data in the failed plexes.

  The following describes the sections of the mail message:

    ·  The medianame list under failed disks specifies disks that appear to
       have completely failed;

    ·  The medianame list under failing disks indicates a partial disk
       failure or a disk that is in the process of failing. When a disk has
       failed completely, the same medianame list appears under both failed
       disks: and failing disks.

    ·  The plexname list under failed plexes shows plexes that have been
       detached due to I/O failures experienced while attempting to do I/O to
       subdisks they contain.

    ·  The plexname list under failed log plexes indicates RAID5 or dirty
       region log (DRL) plexes that have experienced failures. The
       subdiskname list specifies subdisks in RAID5 volumes that have been
       detached due to I/O errors.

  Enabling Hot-Sparing

  By default, hot-sparing is disabled. To enable hot-sparing, enter the
  volwatch command with the -s option, for example:

       # volwatch -s

  To use hot-spare support you should configure a disk as a spare, which
  identifies the disk as an available site for relocating failed subdisks.
  Disks that are identified as spares are not used for normal allocations
  unless you explicitly specify otherwise. This ensures that there is a pool
  of spare disk space available for relocating failed subdisks and that this
  disk space is not consumed by normal operations.

  Spare disk space is the first space used to relocate failed subdisks.
  However, if no spare disk space is available or if the available spare disk
  space is not suitable or sufficient, free disk space is used.

  You must initialize a spare disk and place it in a disk group as a spare
  before it can be used for replacement purposes. If no disks are designated
  as spares when a failure occurs, LSM automatically uses any available free
  disk space in the disk group in which the failure occurs. If there is not
  enough spare disk space, a combination of spare disk space and free disk
  space is used.

  When hot-sparing selects a disk for relocation, it preserves the redundancy
  characteristics of the LSM object to which the relocated subdisk belongs.
  For example, hot-sparing ensures that subdisks from a failed plex are not
  relocated to a disk containing a mirror of the failed plex. If redundancy
  cannot be preserved using available spare disks and/or free disk space,
  hot-sparing does not take place. If relocation is not possible, mail is
  sent indicating that no action was taken.

  When hot-sparing takes place, the failed subdisk is removed from the
  configuration database and LSM takes precautions to ensure that the disk
  space used by the failed subdisk is not recycled as free disk space.

  Initializing and Removing Hot-Spare Disks

  Although hot-sparing does not require you to designate disks as spares, HP
  recommends that you initialize at least one disk as a spare within each
  disk group; this gives you control over which disks are used for
  relocation. If no spare disks exist, LSM uses available free disk space
  within the disk group.  When free disk space is used for relocation
  purposes, it is likely that there may be performance degradation after the
  relocation.

  Follow these guidelines when choosing a disk to configuring as a spare:

    ·  The hot-spare feature works best if you specify at least one spare
       disk in each disk group containing mirrored or RAID5 volumes.

    ·  If a given disk group spans multiple controllers and has more than one
       spare disk,  set up the spare disks on different controllers (in case
       one of the controllers fails).

    ·  For a mirrored volume, the disk group must have at least one disk that
       does not already contain one of the volume's mirrors. This disk should
       either be a spare disk with some available space or a regular disk
       with some free space.

    ·  For a mirrored and striped volume, the disk group must have at least
       one disk that does not already contain one of the volume's mirrors or
       another subdisk in the striped plex. This disk should either be a
       spare disk with some available space or a regular disk with some free
       space.

    ·  For a RAID5 volume, the disk group must have at least one disk that
       does not already contain the volume's RAID5 plex or one of its log
       plexes. This disk should either be a spare disk with some available
       space or a regular disk with some free space.

    ·  If a mirrored volume has a DRL log subdisk as part of its data plex
       (for example, volprint does not list the plex length as LOGONLY),
       that plex cannot be relocated. Therefore, place log subdisks in plexes
       that contain no data (log plexes). By default, the volassist command
       creates log plexes.

    ·  For mirroring the root disk, the rootdg disk group should contain an
       empty spare disk that satisfies the restrictions for mirroring the
       root disk.

    ·  Although it is possible to build LSM objects on spare disks, it is
       preferable to use spare disks for hot-spare only.

    ·  When relocating subdisks off a failed disk, LSM attempts to use a
       spare disk large enough to hold all data from the failed disk.

  To initialize a disk as a spare that has no associated subdisks, use the
  voldiskadd command and enter y at the following prompt:

       Add disk as a spare disk for newdg? [y,n,q,?] (default: n) y

  To initialize an existing LSM disk as a spare disk, enter:

       # voledit set spare=on medianame

  For example, to initialize a disk called test03 as a spare disk, enter:

       # voledit set spare=on test03

  To remove a disk as a spare, enter:

       # voledit set spare=off medianame

  For example, to make a disk called test03 available for normal use, enter:

       # voledit set spare=off test03

  Replacement Procedure

  In the event of a disk failure, mail is sent, and if volwatch was
  configured to run with hot sparing support with the -s option, volwatch
  attempts to relocate any subdisks that appear to have failed. This involves
  finding appropriate spare disk or free disk space in the same disk group as
  the failed subdisk.

  To determine which disk from among the eligible spare disks to use,
  volwatch tries to use the disk that is closest to the failed disk.  The
  value of closeness depends on the controller, target, and disk number of
  the failed disk. For example, a disk on the same controller as the failed
  disk is closer than a disk on a different controller; a disk under the same
  target as the failed disk is closer than one under a different target.

  If no spare or free disk space is found, the following mail message is sent
  explaining the disposition of volumes on the failed disk:

       Relocation was not successful for subdisks on disk dm_name
       in volume v_name in disk group dg_name.
       No replacement was made and the disk is still unusable.

       The following volumes have storage on medianame:

       volumename
       ...

       These volumes are still usable, but the redundancy of
       those volumes is reduced. Any RAID-5 volumes with storage
       on the failed disk may become unusable in the face of further
       failures.

  If non-RAID5 volumes are made unusable due to the failure of the disk, the
  following is included in the mail message:

       The following volumes:

       volumename
       ...

       have data on medianame but have no other usable
       mirrors on other disks. These volumes are now unusable
       and the data on them is unavailable.  These volumes must
       have their data restored.

  If RAID5 volumes are made unavailable due to the disk failure, the
  following message is included in the mail message:

       The following RAID-5 volumes:

       volumename
       ...

       have storage on medianame and have experienced
       other failures. These RAID-5 volumes are now unusable
       and data on them is unavailable.	 These RAID-5 volumes must
       have their data restored.

  If spare disk space is found, LSM attemps to set up a subdisk on the spare
  disk and use it to replace the failed subdisk. If this is successful, the
  volrecover command runs in the background to recover the contents of data
  in volumes on the failed disk.

  If the relocation fails, the following mail message is sent:

       Relocation was not successful for subdisks on disk dm_name in
       volume v_name in disk group dg_name.  No replacement was made
       and the disk is still unusable.

       error message

  If any volumes (RAID5 or otherwise) are rendered unusable due to the
  failure, the following is included in the mail message:

       The following volumes:

       volumename
       ...

       have data on dm_name but have no other usable mirrors on other
       disks. These volumes are now unusable and the data on them is
       unavailable. These volumes must have their data restored.

  If the relocation procedure completes successfully and recovery is under
  way, the following mail message is sent:

       Volume v_name Subdisk sd_name relocated to newsd_name,
       but not yet recovered.

  Once recovery has completed, a message is sent relaying the outcome of the
  recovery procedure. If the recovery was successful, the following is
  included in the mail message:

       Recovery complete for volume v_name in disk group dg_name.

  If the recovery was not successful, the following is included in the mail
  message:

       Failure recovering v_name in disk group dg_name.

SEE ALSO
  mailx(1), rcmgr(8), voldiskadm(8), voledit(8), volintro(8), volrecover(8),
  volrootmir(8)
Index for
Section 8
Alphabetical
listing for V
Top of
page