3    Using hwmgr to Manage Hardware

The principal command that you use to manage hardware is the hwmgr command line interface (CLI). Other interfaces, such as the SysMan tasks provide a limited subset of the features provided by hwmgr. For example, you can use hwmgr to set an attribute for all components of a particular type (such as SCSI disks) on all SCSI adapters in all members of a cluster.

Most hardware management is performed automatically by the system and you need only intervene under certain circumstances, such as replacing a failed component so that the replacement component takes on the identity of the failed component. This chapters discusses the following topics:

3.1    Understanding the Hardware Management Model

Within the operating system kernel, hardware data is organized as a hardware set managed by the kernel set manager. Application requests are passed by library routines to kernel code, or remote code. The latter deals with requests to and from other systems. The hardware component module (HWC) resides in the kernel, and contains all the registration routines to create and maintain hardware components in the hardware set. It also contains the device nodes for device special file management, which is performed by using the dsfmgr command.

The hardware set consists of data structures that describe all of the hardware components that are part of the system. A hardware component becomes part of the hardware set when registered by its driver. Many components support attributes that describe their function and content or control how they operate. Each attribute is assigned a value. You can read, and sometimes manipulate, these attribute values by using the hwmgr command.

The system hardware is organized into three parts, identified as subsystems by the hwmgr command. The subsystems are identified as component, SCSI, and name. The subsystems are related to the system hardware databases as follows:

The general features of hwmgr are as follows:

3.2    Understanding hwmgr Command Options

The hwmgr command works with the kernel hardware management module, providing you with the ability to manage hardware components. Examples of a hardware component are storage peripherals, such as a disk or tape, or a system component such as a CPU or a bus. Use the hwmgr command to manage hardware components on either a single system or on a cluster.

Operational commands are characterized by a subsystem identifier after the command name. The subsystems are: component, scsi, and name.

Some hwmgr operation commands are available for more than one subsystem. You should use the subsystem most closely associated with the type of operation you want to perform, depending on the parameter information that you obtained using the view and show command options.

Some commands require you to specify a subsystem name. However, if you specify the identity of a hardware component, then you do not need to specify a subsystem name. The hwmgr command is able to determine the correct subsystem on which to operate, based on the component identifier.

The command options are organized by task application. The command options, the subsystems on which they operate, and the nature of the operation are listed in the following table:

Option Subsystem Operation
add name Database management
delete component, name, and scsi Database management
edit name, scsi Database management
locate component Hardware configuration
offline component, name Online Addition and Removal
online component, name Online Addition and Removal
power component, name Online Addition and Removal
redirect scsi Hardware configuration
refresh component, scsi Database management
reload name Driver configuration
remove name Database management
scan component, name, and scsi Hardware configuration
status component Hardware configuration
unconfigure component, name Hardware configuration
unindict component Online Addition and Removal
unload name Driver configuration

3.3    Configuring the hwmgr Environment

The hwmgr command provides environment settings that you can use to control the amount of information displayed. Use the following command to display the default environment settings:

# /sbin/hwmgr view env
 
  HWMGR_DATA_FILE = "/etc/hwmgr/hwmgr.dat"
  HWMGR_DEBUG = FALSE
  HWMGR_HEXINTS = FALSE
  HWMGR_NOWRAP = FALSE
  HWMGR_VERBOSE = FALSE

You can set the value of environment variables in your login script, or at the command line as shown in the following example:

# HWMGR_VERBOSE=TRUE
# export HWMGR_VERBOSE

You usually need to define only the value of the HWMGR_HEXINTS HWMGR_NOWRAP, and the HWMGR_VERBOSE environment variables as follows:

3.4    Using hwmgr to Manage Hardware

The following sections contain examples of tasks that you might need to perform by using the hwmgr command. Some of these examples might not be useful for managing a small server with few components attached to the CPU. However, when you are managing a large installation with many networked systems or clusters with hundreds of components, they become very useful. Using the hwmgr command enables you to connect to an unfamiliar system, obtain information about its component hierarchy, and then perform administrative tasks without any previous knowledge about how the system is configured and without consulting system logs or files to find components.

3.4.1    Locating SCSI Hardware

The locate option, which currently works only for some SCSI devices, enables you to identify a device. You might use this command when you are trying to physically locate a SCSI disk. The following command flashes the light on a SCSI disk for one minute:

# /sbin/hwmgr locate -id 42 -time 60

You can then look the disk bays for the component that is flashing its light. You cannot use this option to locate some SCSI devices such as CD-ROM readers and disks that are part of an array (such as an HSV110). However, disks that are part of an array are detected on failure and an event is posted in the Event Manager log. The array controller identifies the failed disk by flashing its read or amber error light. You do not need to manually search for the failed device.

3.4.2    Viewing the System Hierarchy

Use the view command to view the hierarchy of hardware within a system. This command enables you to find what adapters are controlling devices, and discover where adapters are installed on buses. The following example shows typical output on a small system that is not part of a cluster:

# /sbin/hwmgr view hierarchy
HWID:   hardware hierarchy
----------------------------------------------------
   1:   platform AlphaServer 800 5/500
   2:     cpu CPU0
   6:     bus pci0
   7:       connection pci0slot5
  15:         scsi_adapter isp0
  16:           scsi_bus scsi0
  32:             disk bus-0-targ-0-lun-0 dsk0
  33:             disk bus-0-targ-4-lun-0 cdrom0
  34:             disk bus-0-targ-5-lun-0 dsk1
  35:             disk bus-0-targ-6-lun-0 dsk2
  36:             disk bus-0-targ-8-lun-0 dsk3
   9:       connection pci0slot6
  17:         graphics_controller s3trio0
output truncated

Some components might appear as multiple entries in the hierarchy. For example, if a disk is on a SCSI bus that is shared between two adapters, the hierarchy shows two entries for the same device. You can obtain similar views of the system hardware hierarchy by using the SysMan Station GUI.

3.4.3    Viewing Component Categories

To perform hardware management options on all components of the same category, or to select a particular component in a category, you might need to know what categories of components are available. The hardware manager get category command fetches all the possible values for hardware categories.

This command is useful when you use it in conjunction with the get attributes and set attributes options, which enable you to display and configure the attributes (or properties) of a particular component. When you know the hardware categories you can limit your attribute queries to a specific type of hardware, as follows:

# /sbin/hwmgr get category
  Hardware Categories
  -------------------
  category = undefined
  category = platform
  category = cpu
  category = pseudo
  category = bus
  category = connection
  category = serial_port
  category = keyboard
  category = pointer
  category = scsi_adapter
  category = scsi_bus
  category = network
  category = graphics_controller
  category = disk
  category = tape

Knowing the categories, you can focus your attribute query by specifying a category as follows:

# /sbin/hwmgr get attribute -category platform
  1:
   name = AlphaServer 800 5/500
   category = platform
 
 

This output informs you that the system platform has a hardware ID of 1, and that the platform name is AlphaServer 800 5/500. See also the get attribute and set attribute command options.

3.4.4    Obtaining Component Attributes

Attributes are characteristics of the component that might be read-only information, such as the model number of the component, or you might be able to set a value to control some aspect of the behavior of the component, such as the speed at which it operates. The get attribute command option fetches and displays attributes for a component. The hardware manager command is specific to managing hardware and fetches only attributes from the hardware set. All hardware components are identified by a unique hardware identifier, otherwise known as the hardware ID or HWID.

The following command fetches all attributes for all hardware components on the local system and directs the output to a file that you can search for information:

# /sbin/hwmgr get attribute > sysattr.txt

However, if you know which component category you want to query, as described in Section 3.4.3, you can focus your query on that particular category.

Querying a hardware component category for its attributes can provide useful information. For example, you might not be sure if the network is working for some reason. You might not even know what type of network adapters are installed in a system or how they are configured. Use the get attribute option to determine the status of network adapters as shown in the following example:

# /sbin/hwmgr get attribute -category network
  203:
   name = ln0
   category = network
   sub_category = Ethernet
   model = DE422
   hardware_rev =
   firmware_rev =
   MAC_address = 08-00-2B-3E-08-09
   MTU_size = 1500
   media_speed = 10
   media_selection = Selected by Jumpers/Switches
   media_type =
   loopback_mode = 0
   promiscuous_mode = 0
   full_duplex = 0
   multicast_address_list = CF-00-00-00-00-00 \
    01-00-5E-00-00-01
   interface_number = 1

This output provides you with the following information:

In some cases, you can change the value of a component attribute to modify component information or change its behavior on the system. Setting attributes is described in Section 3.4.5. To find which attributes are settable, you can use the get option to fetch all attributes and use the grep command to search for the (settable) keyword as follows:

# /sbin/hwmgr get attribute | grep settable

   device_starvation_time = 25 (settable)
   device_starvation_time = 0 (settable)
   device_starvation_time = 25 (settable)
   device_starvation_time = 25 (settable)
   device_starvation_time = 25 (settable)
   device_starvation_time = 25 (settable)

The output shows that there is one settable attribute on the system, device_starvation_time. Having found this, you can now obtain a list of components that support this attribute as follows:

# /sbin/hwmgr get attribute -a device_starvation_time
  23:
   device_starvation_time = 25 (settable)
  24:
   device_starvation_time = 0 (settable)
  25:
   device_starvation_time = 25 (settable)
  31:
   device_starvation_time = 25 (settable)
  34:
   device_starvation_time = 25 (settable)
  35:
   device_starvation_time = 25 (settable)

The output from this command displays the HWID of the components that support the device_starvation_time attribute. Reading the HWID in the hierarchy output, you can determine that this attribute is supported by SCSI disks.

To determine the link speed of a Fibre Channel adapter, you can query its link speed, as follows:

# /sbin/hwmgr get attribute -a link_speed
656:
  link_speed = 1Ghz

If more than one adapter is connected, the preceding command displays the attribute value for all the adapters. The link_speed attribute supports the following values:

See also the set attribute and get category options.

3.4.5    Setting Component Attributes

The set attribute command option allows you to set (or configure) the value of settable attributes. You cannot set all component attributes. When you use the get attribute command option, the output flags any configurable attributes by labeling them as (settable) next to the attribute value. A method of finding such attributes is described in Section 3.4.4.

As demonstrated in Section 3.4.4, the value of device_starvation_time is an example of a settable attribute supported by SCSI disks. This attribute controls the amount of time that must elapse before the disk driver determines that a component is unreachable due to SCSI bus starvation (no data transmitted). If the device_starvation_time expires before the driver is able to determine that the component is still there, the driver posts an error event to the binary error log.

Using the following commands, you can change the value of the device_starvation_time attribute for the component with the HWID of 24, and then verify the new value:

# /sbin/hwmgr set attribute -id 24 \
-a device_starvation_time=60
# /sbin/hwmgr get attribute -id 24 \
-a device_starvation_time
  24:
   device_starvation_time = 60 (settable)

This action does not change the saved value for this attribute. All attributes have three possible values, a current value, a saved value and a default value. The default value is a constant and you cannot modify it. If you never set a value of an attribute, the default value applies. When you set the saved value, it persists across boots. You can think of it as a permanent override of the default.

When you set the current value, it does not persist across reboots. You can think of it as a temporary value for the attribute. When a system is rebooted, the value of the attribute reverts to the saved value (if there is a saved value). If there is no saved value, the attribute value reverts to the default value. Setting an attribute value always changes the current value of the attribute. The following examples show how you get and set the saved value of an attribute:

# /sbin/hwmgr get attribute  saved -id 24 \
-a device_starvation_time
  24:
   saved device_starvation_time = 0 (settable)
 
# /sbin/hwmgr get attribute saved -id 24 \
-a device_starvation_time=60 
    saved device_starvation_time = 60 (settable)
# /sbin/hwmgr get attribute saved -id 24 \
-a device_starvation_time
  24:
   saved device_starvation_time = 60 (settable)

See also the get attribute and get category command options.

3.4.6    Viewing the Cluster

If you are working on a cluster, you often need to focus hardware management commands at a particular host on the cluster. The view cluster command option enables you to obtain details of the hosts in a cluster. The following sample output shows a typical cluster:

# /sbin/hwmgr view cluster
  Member ID     State   Member HostName
  ---------     -----   ---------------
    1           UP      ernie.zok.paq.com (localhost)
    2           UP      bert.zok.paq.com
    3           DOWN    bigbird.zok.paq.com

You can also use this option to verify that the hwmgr command is aware of all cluster members and their current status.

The preceding example indicates a three member cluster with one member (bigbird) currently down. The (localhost) marker indicates that hwmgr is currently running on cluster member ernie. Any hwmgr commands that you enter by using the -cluster option are sent to members bert and ernie, but not to bigbird because that system is unavailable. Additionally, any hwmgr commands that you issue with the -member bigbird option fail because the cluster member state for that host is DOWN.

The view cluster command option works only if the system is a member of a cluster. If you attempt to run it on a single system, an error message is displayed. See also the clu_get_info command, and the TruCluster Server documentation for more information on clustered systems.

3.4.7    Viewing Devices

You can use the hwmgr command to display all components that have a device special file name, such as /dev/disk/dsk34 by using the view devices option. The hardware manager considers any hardware component that has the attribute dev_base_name to be an accessible device. (See Section 3.4.4 for information on obtaining the attributes of a device.)

The view devices option enables you to determine what components are currently registered with hardware management on a system, provides information that enables you to access these components through their device special file. For example, if you load a CD-ROM into a reader, use this output to determine whether you mount the CD-ROM reader as /dev/disk/cdrom0. The view devices option is also useful to find the HWIDs for any registered devices. When you know the HWID for a device, you can use other hwmgr command options to query attributes on the device, or perform other operations on the device.

Typical output from this command is shown in the following example:

# /sbin/hwmgr view dev
 
 

  HWID:             DSF Name Mfg      Model       Location
 
----------------------------------------------------------------------
     3:            /dev/kevm
    22:      /dev/disk/dsk0c DEC      RZ26        bus-0-targ-3-LUN-0
    23:    /dev/disk/cdrom0c DEC      RRD42       bus-0-targ-4-LUN-0
    24:      /dev/disk/dsk1c DEC      RZ26L       bus-1-targ-2-LUN-0
    25:      /dev/disk/dsk2c DEC      RZ26L       bus-1-targ-4-LUN-0
    29:     /dev/ntape/tape0 DEC      TLZ06       bus-1-targ-6-LUN-0
    35:     /dev/disk/dsk8c  COMPAQ   RZ1CF-CF    bus-2-targ-12-LUN-0

The output shows all hardware components that have the dev_base_name attribute on the local system. The hardware manager attempts to resolve the dev_base_name to the full path location to the device special file, such as /dev/ntape/tape0. It always uses the path to the device special file with the c partition. The c partition represents the entire capacity of the device, except in the case of tapes. See Chapter 1 for information on device special file names and functions.

If you are working on a cluster, you can view all components registered with hardware management across the entire cluster with the -cluster option, as follows:

# /sbin/hwmgr view devices -cluster

  HWID:             DSF Name    Model       Location        Hostname
 
------------------------------------------------------------------
   20:   /dev/disk/floppy0c    3.5in          fdi0-unit-0   tril7e
   34:    /dev/disk/cdrom0c    RRD46   bus-0-targ-5-LUN-0   tril7e
   35:      /dev/disk/dsk0c    HSG80   bus-4-targ-1-LUN-1   tril7d
   35:      /dev/disk/dsk0c    HSG80   bus-6-targ-1-LUN-1   tril7e
   36:      /dev/disk/dsk1c    RZ26N   bus-1-targ-0-LUN-0   tril7e
   37:      /dev/disk/dsk2c    RZ26N   bus-1-targ-1-LUN-0   tril7e
   38:      /dev/disk/dsk3c    RZ26N   bus-1-targ-2-LUN-0   tril7e
   39:      /dev/disk/dsk4c    RZ26N   bus-1-targ-3-LUN-0   tril7e
   40:      /dev/disk/dsk5c    RZ26N   bus-1-targ-4-LUN-0   tril7e
   41:      /dev/disk/dsk6c    RZ26N   bus-1-targ-5-LUN-0   tril7e
   42:      /dev/disk/dsk7c    RZ26N   bus-1-targ-6-LUN-0   tril7e
   43:      /dev/disk/dsk8c    HSZ40   bus-3-targ-2-LUN-0   tril7d
   43:      /dev/disk/dsk8c    HSZ40   bus-3-targ-2-LUN-0   tril7e
   44:      /dev/disk/dsk9c    HSZ40   bus-3-targ-2-LUN-1   tril7d
   44:      /dev/disk/dsk9c    HSZ40   bus-3-targ-2-LUN-1   tril7e
   45:     /dev/disk/dsk10c    HSZ40   bus-3-targ-2-LUN-2   tril7d
   45:     /dev/disk/dsk10c    HSZ40   bus-3-targ-2-LUN-2   tril7e

Some devices, such as the disk with the HWID of 45:, appear more than once in this display. These are components that are on a shared bus between two cluster members. The hardware manager displays the component entry as seen from each cluster member.

See also the following hwmgr command options: show scsi, show components, and get attributes.

3.4.8    Viewing Transactions

Hardware management operations are transactions that must be synchronized across a cluster. The view transaction command option displays the state of any hardware management transactions that have occurred since you booted the system. Use this option to find failed hardware management transactions.

If you do not specify the -cluster or -member option, the command displays status on transactions that are processed or initiated by the local host (the system on which the command is entered). The view transaction command option is primarily for debugging problems with hardware management in a cluster, and you are likely to use this command infrequently. The command has the following typical output:

# /sbin/hwmgr view transactions
   hardware management transaction status
  -----------------------------------------------------
  there is no active transaction on this system
   the last transaction initiated from this system was:
    transaction = modify cluster database
    proposal    = 3834
    sequence    = 0
    status      = 0
   the last transaction processed by this system was:
    transaction = modify cluster database
    proposal    = 3834
    sequence    = 0
    status      = 0
 
 proposal                      last status  success  fail
 ----------------------------  -----------  -------  ----
              Modify CDB/ 3838  0            3        0
                Read CDB/ 3834  0            3        0
            No operation/ 3835  0            1        0
             Change name/ 3836  0            0        0
             Change name/ 3837  0            0        0
               Locate HW/ 3832  0            0        0
                 Scan HW/ 3801  0            0        0
   Unconfig HW - confirm/ 3933  0            0        0
    Unconfig HW - commit/ 3934  0            0        0
     Delete HW - confirm/ 3925  0            0        0
      Delete HW - commit/ 3926  0            0        0
   Redirect HW - confirm/ 3928  0            0        0
   Redirect HW - commit1/ 3929  0            0        0
   Redirect HW - commit2/ 3930  0            0        0
          Refresh - lock/ 3937  0            0        0

This output indicates that the last transaction was a modification of the cluster database.

3.4.9    Creating a User-Defined SCSI Device Name

Most components have an identification attribute that is a unique to the device. You can read it as the serial_number or name attribute of a SCSI device. For example, the following hwmgr command returns both these attributes for the component with a HWID of 30, a SCSI disk:

# /sbin/hwmgr get attribute -id 30 -a serial_number -a name
30:
  serial_number = SCSI-WWID:0c000008:0060-9487-2a12-4ed2
  name = SCSI-WWID:0c000008:0060-9487-2a12-4ed2

This string is known as a worldwide identifier (WWID) because it is unique for each component on the system.

Some components do not provide a unique identifier. The operating system creates such a number for the component by using valid path bus/target/LUN data that describes the physical location of the device. Because systems can share devices, each system that has access to the component sees a different path and creates its own unique WWID for that device. Concurrent I/O access to such shared devices might occur, possibly resulting in data corruption. To find such devices, use the following command:

# /sbin/hwmgr show component -cshared
 
 HWID:  HOSTNAME   FLAGS SERVICE COMPONENT NAME
-----------------------------------------------
   40:  joey       -cd-- iomap   SCSI-WWID:04100026:"DEC \
 RZ28M    (C) DEC00S846590H7CCX"
   41:  joey       -cd-- iomap   SCSI-WWID:04100026:"DEC \
 RZ28L-AS (C) DECJEE019480P2VSN"
   42:  joey       -cd-- iomap   SCSI-WWID:0410003a:"DEC \
 RZ28     (C) DECPCB=ZG34142470  ; HDA=000034579643"
   44:  joey       rcd-- iomap   SCSI-WWID:04100026:"DEC \
 RZ28M    (C) DEC00S735340H6VSR"
.
.
.
 
 

Some devices, such as the TL895 model media changer, do not support INQUIRY pages 0x80 or 0x83 and are unable to provide the system with a unique WWID. To support features such as path failover or installation into a cluster on a shared bus, you must manually add such devices to the system. This is the recommended method to add only media changers to a shared bus. Other types of devices such as disks, CD-ROM readers, tape drives, or RAID controllers provide a unique string (such as a serial number), from which the system can create a unique WWID. You can use such a component on a shared bus because its WWID is always the same and the operating system always recognizes it as the same device.

You can use the hwmgr command to create a user-defined unique name that in turn enables you to create a WWID known to all systems that are sharing the device. Because the component has a common WWID, it has one set of device special file names, preventing the risk of concurrent I/O.

The process for creating a user-defined name is as follows:

Caution

You must update all clustered systems that have access to the device.

The following example shows how you assign a user-defined name. Although the edit scsi command option is recommended only for devices that do not have a unique WWID, the example uses disks for the sake of simplicity.

# /sbin/hwmgr show scsi
 
      SCSI           DEVICE DEVICE  DRIVER NUM  DEVICE FIRST
HWID: DEVICEID HOST  TYPE   SUBTYPE OWNER  PATH FILE   VALID
      ID       NAME                                    PATH
 ------------------------------------------------------------
  22: 0       ftwod  disk   none    0      1   dsk0   [0/3/0]
  23: 1       ftwod  cdrom  none    0      1   cdrom0 [0/4/0]
  24: 2       ftwod  disk   none    0      1   dsk1   [1/2/0]
  25: 3       ftwod  disk   none    2      1   dsk2   [2/4/0]

This command displays which SCSI devices are on the system. On this system the administrator knows that there is a shared bus and that hardware components 24 and 25 are actually the same device. The WWID constructed for this component is constructed by using the bus/target/LUN address information. Because the bus/target/LUN addresses are different, the component is seen as two separate devices. This can cause data corruption problems because the operating system might use two different sets of device special files to access the disk (dev/disk/dsk1 and /dev/disk/dsk2).

The following command shows how you can rename the device, and how it appears after it is renamed:

# /sbin/hwmgr edit scsi -did 2 \
-uwwid "this is a test"
    hwmgr: Operation completed successfully.
 
# /sbin/hwmgr show scsi -did 2 -full

        SCSI               DEVICE  DEVICE  DRIVER NUM  DEVICE FIRST
  HWID: DEVICEID HOSTNAME  TYPE    SUBTYPE OWNER  PATH FILE   VALID PATH
  ----------------------------------------------------------------------
   24:  2        ftwod     disk    none    0      1    dsk1   [1/2/0]
 
      WWID:0910003c:"DEC    (C) DECZG41400123ZG41800340:d01t00002l00000"
      WWID:ff10000e:"this is a test"
 
      BUS   TARGET  LUN   PATH STATE
      ------------------------------
      1     2       0     valid

You repeat the operation on the other component path and the same name is given to the component at address 2/4/0. After you do this, hardware management uses your user-defined name to track the component and to recognize the alternate paths to the same device:

# /sbin/hwmgr edit scsi -did 3 -uwwid "this is a test"
    hwmgr: Operation completed successfully.
 
# /sbin/hwmgr show scsi -did 3 -full

        SCSI               DEVICE  DEVICE  DRIVER NUM  DEVICE FIRST
  HWID: DEVICEID HOSTNAME  TYPE    SUBTYPE OWNER  PATH FILE   VALID PATH
  ----------------------------------------------------------------------
   25:  3        ftwod     disk    none    0      1    dsk1   [2/4/0]
 
      WWID:0910003c:"DEC    (C) DECZG41400123ZG41800340:d02t00004l00000"
      WWID:ff10000e:"this is a test"
 
      BUS   TARGET  LUN   PATH STATE
      ------------------------------
      2     4       0     valid

Both of these devices now use device special file name (/dev/disk/dsk1). There is no longer any risk of data corruption resulting from two sets of device special files accessing the same disk.

3.4.10    Deleting a SCSI Device

Under some circumstances, you might want to remove a SCSI device from a system, such as when it is logging errors and you must replace it. Use the delete scsi command option to remove a SCSI component from all hardware management databases clusterwide. This option unregisters the component from the kernel, removes all persistent database entries for the device, and removes all device special files. When you delete a SCSI component, it is no longer accessible and its device special files are removed from the appropriate /dev subdirectory. You cannot delete a SCSI component that is currently open. You must terminate all I/O connections to the device (such as mounts).

You might need to delete a SCSI component if you are removing it from your system and you do not want information about the component remaining on the system. You might also want to delete a SCSI component because of operating system problems, rather than hardware problems. For example, if the component operates correctly but you cannot access it through the device special file for some reason. In this case you can delete the component and use the scan scsi command option to find and register it.

To replace the SCSI component (or bring the old component back) you can use the scan scsi command option to find the component again. However, when you delete a component and then perform a scan operation to bring the component back on line, it does not always have the same device special file. To replace a component as an exact replica of the original, you must perform the additional operations described in Section 3.4.12. Ascan operation might not find the component if it is not actively responding during the bus scan.

This option accepts the SCSI device identifier -did, which is not equivalent to the HWID. The following examples show how you examine the SCSI database and then delete a SCSI device:

# /sbin/hwmgr show scsi

      SCSI           DEVICE DEVICE  DRIVER NUM  DEVICE FIRST
HWID: DEVICEID HOST- TYPE   SUBTYPE OWNER  PATH FILE   VALID
               NAME                                     PATH
------------------------------------------------------------
23:   0       bert   disk   none    2      1   dsk0   [0/3/0]
24:   1       bert   cdrom  none    0      1   cdrom0 [0/4/0]
25:   2       bert   disk   none    0      1   dsk1   [1/2/0]
30:   4       bert   tape   none    0      1   tape2  [1/6/0]
31:   3       bert   disk   none    0      1   dsk4   [1/4/0]
34:   5       bert   disk   none    0      1   dsk7   [2/5/0]
35:   6       bert   disk   none    0      1   dsk8

In this example, the DRIVER OWNER field is not zero for component ID 23, indicating that the device is currently open by a driver. Any number other than zero in the DRIVER OWNER field means that a driver has opened the component for use. Therefore, you cannot delete SCSI component 23 because it is currently in use.

However, component ID 35 is not open by a driver, and it currently has no valid paths shown in the FIRST VALID PATH field. The component is not currently accessible and you can delete it safely. When you delete the device, you also delete the /dev/disk/dsk8* and /dev/rdisk/dsk8* device special files.

To delete the SCSI device, specify the SCSI DEVICEID value with the delete option, and then review the SCSI database as follows:

# /sbin/hwmgr delete scsi -did 6
   hwmgr: The delete operation was successful.
# /sbin/hwmgr show scsi
 
      SCSI            DEVICE  DEVICE  DRIVER NUM  DEVICE FIRST
HWID: DEVICE HOSTNAME TYPE    SUBTYPE OWNER  PATH FILE   VALID
      ID                                                  PATH
  -------------------------------------------------------------
23:   0      bert     disk    none    2      1   dsk0   [0/3/0]
24:   1      bert     cdrom   none    0      1   cdrom0 [0/4/0]
25:   2      bert     disk    none    0      1   dsk1   [1/2/0]
30:   4      bert     tape    none    0      1   tape2  [1/6/0]
31:   3      bert     disk    none    0      1   dsk4   [1/4/0]
34:   5      bert     disk    none    0      1   dsk7   [2/5/0]

The component /dev/disk/dsk8 is successfully deleted.

3.4.11    Reconfiguring Disks Under RAID Arrays

When you reconfigure RAID arrays the new block zero might be the same block as the previous block zero. This can lead to problems caused by applications that see the disklabel as valid even though it might extend beyond the end of the disk. After a scan, the system recognizes the new unit(s) as dskNN. Before using the disk, run the following command to zero any inappropriate label:

# /sbin/disklabel -z dskNN

Run this command when you construct a new unit on a RAID array or when you move one or more disks comprising a unit on a raid array to connect them directly to a host bus adapter.

Next, run the disklabel command to create a new default label (or apply a preconfigured label from a proto file) as follows:

# /sbin/disklabel -rwn dskNN
# /sbin/disklabel -Rr dskNN PROTOFILE

3.4.12    Replacing a Failed SCSI Disk

When a SCSI disk fails, you might want to replace it in such a way that the replacement disk takes on hardware characteristics of the failed device, such as ownership of the same device special files. The redirect command option enables you to assign such characteristics. For example, if you have an HSZ (RAID) cabinet and a disk fails, you can hot-swap the failed disk and then use the redirect command option to bring the new disk on line as a replacement for the failed disk.

Do not use this procedure alone if a failed disk is managed by an application such as AdvFS or LSM. Before you can swap managed disks, you must put the disk management application into an appropriate state or remove the disk from the management application. See the appropriate documentation, such as the Logical Storage Manager and AdvFS Administration manuals.

Note

The replacement disk must be of the same type for the redirect operation to work.

The following example shows how you use the redirect option:

# /sbin/hwmgr show scsi
      SCSI          DEVICE DEVICE DRIVER NUM  DEVICE  FIRST
HWID: DEVICE- HOST- TYPE   SUB-   OWNER  PATH FILE    VALID
      ID      NAME         TYPE                       PATH
  ---------------------------------------------------------
 23:   0     fwod  disk   none   2      1    dsk0   [0/3/0]
 24:   1     fwod  cdrom  none   0      1    cdrom0 [0/4/0]
 25:   2     fwod  disk   none   0      1    dsk1   [1/2/0]
 30:   4     fwod  tape   none   0      1    tape2  [1/6/0]
 31:   3     fwod  disk   none   0      1    dsk4
 37:   5     fwod  disk   none   0      1    dsk10  [2/5/0]

This output shows a failed SCSI disk of HWID 31. The component has no valid paths. To replace this failed disk with a new disk that has device special file name /dev/disk/dsk4, and the same dev_t information, use the following procedure:

  1. Install the component as described in the hardware manual.

  2. Use the following command to find the new device:

    # /sbin/hwmgr scan scsi
    

    This command probes the SCSI subsystem for new devices and registers those devices. You can then repeat the show scsi command and obtain the SCSI device id (did) of the replacement device.

  3. Use the following command to reassign the component characteristics from the failed disk to the replacement disk. This example assumes that the SCSI device id (did) assigned to the new disk is 36:

    # /sbin/hwmgr redirect scsi -src 3 -dest 36
    

Note

If the redirect option fails to work for a disk, use the alternate procedure described in Section 3.4.13.

3.4.13    Replacing a Failed SCSI Tape Drive (or Hard Disk)

When a SCSI tape fails, you might want to replace it in such a way that the replacement drive takes on hardware characteristics of the failed device, such as ownership of the same device special files. The redirect command option described in Section 3.4.12 might not work for certain SCSI tape drives or certain disks. Use the following procedure to ensure that the operation completes successfully. The following prerequisites and options apply to this procedure:

  1. Verify the failed component by using the following command:

    # /sbin/hwmgr show scsi
          SCSI          DEVICE DEVICE DRIVER NUM  DEVICE  FIRST
    HWID: DEVICE- HOST- TYPE   SUB-   OWNER  PATH FILE    VALID
          ID      NAME         TYPE                       PATH
      ---------------------------------------------------------
     31:  5       rocym  tape  none    2     1   tape0  [     ]
     
     
    

    The preceding output shows a failed SCSI disk tape drive with HWID 31. The drive has no valid paths. To replace this failed tape drive with a new tape drive that has device special file name /dev/tape/tape2, and the same dev_t information, use the following procedure:

  2. Install the component as described in the hardware manual.

  3. Scan for the new device by using the following command:

    # /sbin/hwmgr scan scsi | grep tape
    

  4. Use the following command to find the SCSI database entry for both the failed and the replacement device:

    # /sbin/hwmgr show scsi | grep tape
    

    You will see a new entry for the replacement device and an incomplete entry for the original (failed) device, as shown in the following sample output:

          SCSI              DEVICE DEVICE  DRIVER NUM  DEVICE FIRST
    HWID: DEVICEID HOSTNAME TYPE   SUBTYPE OWNER  PATH FILE   VALID
                                                              PATH
    ---------------------------------------------------------------
    31:   5       rocym     tape    none    2     1   tape0  [     ]
    35:   7       rocym     tape    none    0     1   tape5  [0/7/0]
     
     
    

  5. Use the hwmgr command to delete the database entries for the failed device, specifying its hardware identifier (HWID) as follows:

    # /sbin/hwmgr delete component -id 31 
     
    

  6. Rename the replacement device and transfer the device special files of the failed device by using the following command:

    # /sbin/dsfmgr -m tape5 tape0
    tape5=>tape0  tape5_d0=>tape0_d0  tape5_d1=>tape0_d1
    tape5_d2=>tape0_d2  tape5_d3=>tape0_d3  tape5_d4=>tape0_d4
    tape5_d5=>tape0c
    

If you are replacing a device that you cannot hot-swap, you must shut down the system. In such cases, modify the first two steps of the procedure as follows:

  1. Insert the replacement device and boot the system. A SCSI scan runs automatically at boot time, and you will see console messages indicating that the replacement device was found and registered.

  2. When the system is at single-user mode, verify that the replacement device was found by using the following command:

    # /sbin/hwmgr show scsi | grep device_type
    

    Replace the device_type variable with the type, of device such as disk or tape.

Proceed with Step 3 of the original procedure.

3.4.14    Using hwmgr to Replace a Cluster Member's Boot Disk

On a single system, the hwmgr command provides a redirect option which you use as part of the procedure to replace a failing disk. When you replace the failed disk, you use the redirect option to direct I/O from the failed component to the replacement device. This option redirects device special file names, cluster dev_t values, local dev_t values, logical ID, and HWID.

Only unique device identifiers (did) are accepted by the redirect option. In a cluster, device identifiers are not guaranteed to be unique and the command might fail as shown in the following example:

# /sbin/hwmgr redirect scsi -src source_did -dest target_did
# "Error (95) Cannot start operation." 

For the redirect operation to succeed, both or neither of the hardware identifiers must exist on each member of the cluster. Use the following procedure to ensure that the redirect operation works:

  1. Verify whether the source and destination component exist. Use the following command on each member of the cluster:

    # /sbin/hwmgr show scsi -did device_identifier
     
          SCSI           DEVICE DEVICE  DRIVER NUM  DEVICE FIRST
    HWID: DEVICEID HOST  TYPE   SUBTYPE OWNER  PATH FILE   VALID PATH
    32:   DID      rymoc disk   none     2     1    dsk1   [0/1/0]
    

  2. Follow this step only if the source component exists on other cluster members but the destination component does not.

    Configure the destination component on those cluster members as follows:

    # /sbin/hwmgr scan scsi
    

    Note

    The bus scan is an asynchronous operation. The system prompt returns immediately but that does not mean that the scan is complete. On systems with many devices, the scan can take several minutes to complete.

  3. Follow this step only if the destination component exists on other members of the system but the source component does not.

    Delete the destination component from those cluster members as follows:

     # /sbin/hwmgr delete scsi did
    

  4. You can now use the redirect option to direct I/O to the replacement drive.

3.4.15    Viewing the Persistence Database for the name Subsystem

The name persistence database stores information about the hardware topology of the system. This data is maintained by the kernel and includes data for controllers and buses. Use the show name command option to display persistence data that you can manipulate by using other hwmgr commands.

The following example shows typical output from the show name command option on a small system:

# /sbin/hwmgr show name -member ychain
 
 HWID:  NAME    HOSTNAME   PERSIST TYPE    PERSIST AT
-----------------------------------------------------
   13:  isp0    ychain     BUS             pci0 slot 5
    4:  pci0    ychain     BUS             nexus
   14:  scsi0   ychain     CONTROLLER      isp0 slot 0
   29:  tu0     ychain     CONTROLLER      pci0 slot 11

The following information is provided by the output:

3.4.16    Deleting and Removing a Component from the name Persistence Database

One of the options for manipulating the name subsystem is to remove components from the persistence database. The hwmgr command offers two methods of removal:

The following example shows typical output from the show name command option on a small system. You specify the variable name, which is the component name shown in the output from the show name command option described in Section 3.4.15:

# /sbin/hwmgr show name
 HWID:  NAME    HOSTNAME  PERSIST TYPE    PERSIST AT
 
------------------------------------------------------
   33:  aha0    fegin     BUS             eisa0 slot 7
   31:  ln0     fegin     CONTROLLER      eisa0 slot 5
    8:  pci0    fegin     BUS             ibus0 slot 0
   34:  scsi1   fegin     CONTROLLER      aha0 slot 0
   17:  scsi0   fegin     CONTROLLER      psiop0 slot 0
   15:  tu0     fegin     CONTROLLER      pci0 slot 0

Two SCSI adapters are shown in the preceding output. If scsi0 is the target of a remove operation, then scsi1 does not become scsi0. The location of the adapter persists at aha0 slot 0 and the name scsi1 is saved across boots.

To remove scsi0 and rename scsi1, use the following commands:

# /sbin/hwmgr remove name -entry scsi0
# /sbin/hwmgr edit name -entry scsi1 -parent_num 0
 
 

3.4.17    Optimizing the Hardware Databases

The number of stale paths might impact the system boot time, not because it takes any longer to read the database, but because the SCSI subsystem attempts to probe each path even if it is stale. Such stale paths can occur if you make many changes to the system's configuration, such as by moving storage to different adapters or if you remove or replace adapters. However, if you have inexplicably large numbers of stale paths on your system, it might indicate a configuration problem and you should consult your technical support representative before using the refresh option.

To remove stale paths, use the following option:

# /sbin/hwmgr refresh scsi

The preceding command deletes the stale paths to SCSI devices, except for any stale path that the system sees as the first path to the device.

The refresh component option is not strictly necessary and generally has no impact on boot but makes the output from commands easier to interpret. If you make significant hardware configuration changes, particularly when you remove and replace components, there will be many irrelevant entries in the command output. These unused entries are only visible when you display the component database by using the following command:

# /sbin/hwmgr show component

Use the following command to remove database entries for components that will never be returned to the system:

# /sbin/hwmgr refresh component

3.4.18    Renaming Components

Component names are based on the driver interfaces used, and on the instance of the device. For example, a component named tu0 is a NIC that supports the Tulip Ethernet interface. (See tu(7)). A model DE500 NIC appears as a component named eeN in the name database (See ee(7).)

Based on its physical location, a component is assigned a persistence type and persistence location. This information is stored in the name database. The persistence type can be a network controller or a peripheral component interconnect (PCI) bus providing a series of slots. The persistence location of a component can be the logical address of a particular slot on a bus, such as slot 2 on PCI bus 1 or a main bus location (nexus).

The following procedure shows how you can assign your preferred names to components. You might want to do this to keep component names consistent across different systems so that scripts which address a specific component are easier to maintain.

Note

HP recommends that you let your system dynamically assign component names whenever possible. There are alternate procedures that enable you to preserve system customizations when updating your environment, For example, the supported method of cloning systems is the installation cloning utility. See the Installation Guide — Advanced Topics manual for your version of the operating system. Because component names are dynamic, consider making your local scripts and programs independent of component names. To aid you in this, the hwmgr and dsfmgr commands enable you to easily determine component names and component availability. Scripts and programs are much more dependable and portable if they do not address static names.

3.4.18.1    Identifying the Components

Use the following hwmgr command to view the content of the name database:

# /sbin/hwmgr show name
 HWID:  NAME    HOSTNAME    PERSIST TYPE    PERSIST AT
---------------------------------------------------------
   41:  ata1    rocym       BUS             pci0 slot 205
   39:  ata0    rocym       BUS             pci0 slot 105
   56:  ee2     rocym       CONTROLLER      pci3 slot 5
   19:  ee1     rocym       CONTROLLER      pci2 slot 5
   18:  ee0     rocym       CONTROLLER      pci2 slot 4
   54:  itpsa1  rocym       BUS             pci3 slot 4
 
 

The preceding truncated display shows three NICs in the system named ee0, ee1, and ee2. The display also provides the physical location of the network components, such as PCI bus 2, slot 5, for component ee1.

To change the names of the components to ee3, ee4 and ee5, use the procedure in Section 3.4.18.2.

3.4.18.2    Renaming the Components

Rename the components by using the following procedure:

  1. Use the following command to remove the hardware persistence entries from the name database:

    # /sbin/hwmgr remove name -entry ee0
    # /sbin/hwmgr remove name -entry ee1
    # /sbin/hwmgr remove name -entry ee2
    

    Repeat the command for each component that you want to rename. This action does not affect any hardware component that is currently using the removed name. It only affects the persistence of the name across reboots.

    Note

    Using the remove option instead of the delete option preserves any attribute settings that exist for the components. For example, you can define your own user name for component ee0 by specifying a value for its user_name attribute. Your user-specified name (and any other customized settings) are not preserved if you use the delete option.

  2. Shut down and reboot the system by using the following command:

    # shutdown -r now
    

    As the system reboots, the persistent names are recreated. The new names are assigned in the order in which the hardware is probed. For example, ee0 is assigned to the first NIC discovered during hardware probe, ee1 is assigned to the next, and so on until all the components are discovered.

  3. Delete each component's previous name from the name database by using the following commands:

    # /sbin/hwmgr delete name -entry ee0
    # /sbin/hwmgr delete name -entry ee1
    # /sbin/hwmgr delete name -entry ee2
    

3.4.18.3    Verifying the Renaming Procedure

Verify the renaming procedure as follows:

  1. Use the following command to show which components map to a specific PCI slot. In this case, the command searches for ee* components:

    # /sbin/hwmgr show name | grep ee
    .
    .
    .
    56:  ee5     rocym           CONTROLLER      pci3 slot 5
    19:  ee4     rocym           CONTROLLER      pci2 slot 5
    18:  ee3     rocym           CONTROLLER      pci2 slot 4
    .
    .
    .
    

  2. Use the following command to ensure that all the incorrect names were permanently removed from the name databases:

    # /sbin/hwmgr show components | grep ee
    .
    .
    .
    18:  rocym      r---- none    ee3
    19:  rocym      r---- none    ee4
    56:  rocym      r---- none    ee5
    .
    .
    .
    

If the renaming process appears to be unsuccessful, use the troubleshooting suggestions described in the following table:

Problem Possible Solutions

A component is not visible when you use the hwmgr command.

Shut down the system and use the console commands to ensure that the component is visible as part of the configuration. If the component is not visible, see the hardware documentation for component test and verification procedures.

The component is visible, but is apparently generating errors.

If the component is visible to the console, reboot the system and watch the boot messages for any component-specific errors. Such messages might be logged to the log files in the /var/adm directory.

The system reboots correctly, but there are post-boot component errors.

Use the Event Viewer (EVM) to view binary events (binlogd) by using the following command:

# sysman event_viewer

See the System Administration manual for your version of the operating system for more information about the event viewer and associated diagnostic tools.

After a successful reboot and renaming procedure, one or more components still has the incorrect name.

After you have ensured that the component is connected to the system and is visible to the Hardware Manager, repeat the renaming procedure.

You cannot rename the component.

Under unusual circumstances, the content of the hardware databases can become corrupt. Do not attempt to edit or refresh these databases. Contact your technical support service for assistance or perform a full installation and reapply your customizations.

3.4.19    Relocating (Moving) a Component

In Tru64 UNIX, component names are designed to be unique. Some hardware components derive their unique names based on their physical location. In particular, most buses and controllers derive their system-wide unique name from their physical location in the system. If you move such a component from one physical location in the system to another, it might appear as a new device, with a new name, when you subsequently reboot the system.

A script or program that address a specific component (such as a network card) might not find that component if its name changes. If you need to move a component within a system, you might want to preserve its original component name. The procedure described in this section shows how to preserve a component name, using a memory channel card as an example of the moved component.

3.4.19.1    Finding the Component Name

Many component names are based on the driver interfaces used by the device, and the instance of the device. For example, a component named mchan0 is a Memory Channel card, instance 0. Based on the physical location of the device, components are assigned a persistence type and persistence location. This information is stored in the name database.

Use the following hwmgr command to view the content of the name database:

# /sbin/hwmgr show component
HWID:  NAME    HOSTNAME  PERSIST TYPE    PERSIST AT
  62:  ata0    host12    BUS             pci0 slot 15
  45:  emx2    host12    BUS             pci0 slot 1
 228:  mchan0  host12    CONTROLLER      pci1 slot 9
.
.
.

A component might be installed in a bus, such as a peripheral component interconnect (PCI) bus that provides a series of slots. In the preceding output, the PERSIST AT field is comprised of:

To display the topology of subcomponents under a particular component, use the hwmgr view hierarchy command, specifying a component by its hardware identifier, as follows:

#  /sbin/hwmgr view hierarchy -id 610
HWID:   hardware hierarchy
-------------------------------------------------------
 610:   bus pci2
 611:     connection pci2slot0
 617:       scsi_adapter itpsa0
 618:         scsi_bus scsi0
 674:           disk bus-0-targ-5-lun-0 cdrom51
 613:     connection pci2slot1
 619:       scsi_adapter itpsa1
 620:         scsi_bus scsi1
 208:           disk bus-1-targ-0-lun-0 dsk1188
 209:           disk bus-1-targ-1-lun-0 dsk1189
 210:           disk bus-1-targ-2-lun-0 dsk1190
 615:     connection pci2slot2
 621:       network tu0
 
 

3.4.19.2    Relocating the Component

Relocate the component as follows:

  1. Use the following command to find the current bus and slot location of the component that you intend to relocate. This example specifies a memory channel card (mchan):

    # /sbin/hwmgr show name | mchan
    228:  mchan0  host12             CONTROLLER      pci1 slot 9
    

  2. Modify the hardware name database, specifying the new location for the component as follows:

    # /sbin/hwmgr edit name -entry mchan0 parent_num 5 slot 8
    

    This command relocates the component mchan0 from its present location on PCI bus 1 (pci1), slot 9 to a new location at PCI bus 5 (pci5), slot 8.

    Note

    If you cannot obtain the name and location pairing that you want, you might need to delete existing names in the database by using the hwmgr remove name command or hwmgr delete entry command. However, take care not to remove required entries for existing (unmoved) components.

  3. Shut down the system as follows:

    # /usr/sbin/shutdown now
    

  4. Physically relocate the component to its new slot.

  5. Reboot the system to single-user mode as follows:

    >>>boot -flags s
    

  6. Verify that the component is found at the correct location:

    # /sbin/hwmgr show name | mchan
    228:  mchan0  host12             CONTROLLER      pci5 slot 8
    

  7. Boot the system to multiuser mode as follows:

    # [Ctrl/D]
    228:  mchan0  host12             CONTROLLER      pci5 slot 8
    

Problem Possible Solutions

A component is not visible when you use the hwmgr command.

Most likely, this problem is caused by a failure to properly install the component in its new location.

Shut down the system and use the console (>>>) commands to ensure that the component is visible as part of the configuration. If it is visible, repeat the relocation process.

If the component is not visible, see its hardware documentation for component test and verification procedures.

The component is visible, but is apparently generating errors.

If the component is visible to the console, reboot the system and watch the boot messages for any component-specific errors. Such messages might be logged to the log files in the /var/adm directory.

The system reboots correctly, but there are post-boot component errors.

Use the Event Viewer (EVM) to view binary events (binlogd) by using the following command:

# sysman event_viewer

See the System Administration manual for your version of the operating system for more information about the event viewer and associated diagnostic tools.

After a successful reboot and relocation procedure, one or more devices still has the incorrect name.

Verify that you inserted the component in the correct new location.

After you have ensured that the component is connected to the system and is visible to the Hardware Manager, repeat the renaming procedure.