Jump to page titleUNITED STATES
hp.com home products and services support and drivers solutions how to buy
» contact hp


more options
 
hp.com home
End of Jump to page title
HP Services Software Patches
Jump to content


» software & drivers
» ask Compaq
» reference library
» forums & communities
» support tools
» warranty information
» contact support
» parts
» give us feedback

patches by topic
» DOS
» OpenVMS
» Security
» Tru64 Unix
» Ultrix 32
» Windows
» Windows NT

associated links
» what's new
» contract access
» browse patch tree
» search patch tree
» join mailing list

connection tools
» nameserver lookup
» traceroute
» ping


Find Support Information and Customer Communities for Presario.
Content starts here
HSZ_NAM_ISSUE Dual Redundant HSZ Naming Issue in Tru64 UNIX V5.0 TITLE:
TITLE: HSZ_NAM_ISSUE Dual Redundant HSZ Naming Issue in Tru64 UNIX V5.0
 
Copyright 2000 Compaq Computer Corporation.  All rights reserved

   
   DATE: 3 February 2000

   PRODUCT: Tru64 UNIX V5.0
   SOURCE:  Compaq Computer Corporation        


   TITLE: 

     Dual Redundant HSZ naming issue; Tru64 UNIX V5.0 or later; 
     Impacts fail-over function.



   =================================================================

   PRODUCT NAME(S) IMPACTED:

   PRODUCT FAMILY(IES):                           PRODUCT NUMBERS:

   Storage         _X_                            HSZ70
   Systems         _X_                            Alpha
   Networks        ___
   PC              ___                            ________________
   Software        _X_                            Tru64 UNIX 5.0
   Other (specify) ___                            ________________


PROBLEM STATEMENT:
==================

The Tru64 UNIX V5.0 software uses the product identifier (PID) field
provided by HSZ controllers to assist in uniquely identifying logical
disk units.  If dual controllers with different PID fields are present,
then the logical units will not be properly identified as the same units,
and controller failover will not occur.  The hardware management software
[hwmgr(8a)] will not properly recognize the unit after the failover, and
all access attempts to the unit will fail.  In addition, since after the
failover attempt, hwmgr will see the unit as a different logical unit, a
new device name will exist for that logical unit.  In effect, there will
be 2 device names for each logical device (1 name for each of the 2
controllers in the redundant pair).

The worst impact of this problem is that it can remain undetected until
an error occurs that would generate a controller failover.  At the
crucial time when a failover is needed and expected, it will not work.

In HSZ controller firmware HSOF V7.7, this problem will be corrected by
ensuring that the PID fields are synchronized.  For more details, refer
to the "FIRMWARE NOTE" below.


Background Info:

All SCSI devices contain what is known as a vendor identifier (VID) and a
product identifier (PID).  These identifiers are installed in the device
during manufacturing.  For dual redundant HSZ controllers to operate
correctly with Tru64 UNIX 5.0, the VID and PID contained in each of the
2 controllers in the set must match.  It is possible however, that some
controllers may have different PID fields.  For example:

   "HSZ70" versus  "HSZ70 (C) DEC"


CONFIGURATIONS AFFECTED:
========================

Tru64 UNIX Version 5.0 or later systems with Dual Redundant HSZ70
controllers with firmware version less than HSOF V7.7. (See Firmware
Note at the end of this article.)


PROBLEM SYMPTOM:
================

Controller failover will not complete and after failover attempt, all
access attempts to the unit will fail.



PROBLEM SOLUTION:
=================

Due to the severe consequences of this problem (fail-over inoperative)
and the "invisible" nature (you won't know about it until fail-over is
attempted), we recommend a pro-active approach to this problem. Field
personnel should check existing Tru64 UNIX V5 installations and take
necessary corrective action as described in this article.  Tru64/Digital
UNIX installations that are to be upgraded to V5 should be examined for
this issue as a part of upgrade planning.


How to determine the PID field value for HSZ units:
---------------------------------------------------
The PID can be determined in several ways.  A local terminal can be
attached to the HSZ maintenance terminal port, or the CLI window of SWCC
can be used. The "show this" and "show other" commands will display the
PID field on the first line of the output.  The following example
illustrates a case of mis-matched PID values:

  HSZ> show this
  Controller:
        HSZ70    (C) DEC ZG41400123 Firmware V25Z-0, Hardware  A02
  .
   .
    .
  HSZ> show other
  Controller:
        HSZ70 ZG41800340 Firmware V25Z-0, Hardware 0000
  .
   .
    .

(Note "HSZ70 (C) DEC" vs. "HSZ70")

An alternate method of examining the PID field is to use the scu command
from the host:

  > scu
  scu> set nexus bus a target b lun c
  scu> show inq
  <....>
        Product Identification: HSZ70
(vs.)
        Product Identification: HSZ70     (C) DEC
  scu>

When using scu, a preferred target from each HSZ controller must be
examined when determining if they use the same PID field.


What to do:
-----------
If the PID fields for each redundant pair is the same, then no action is
required.  If the PID fields for a pair is found to be different, then
replace one of the dual controllers so that a match can be obtained.
Which name is matched does not matter, simply that they are match exactly.


When to perform the check:
---------------------
The check should be performed:
 - prior to the installation of the Tru64 UNIX 5.0 software;
 - prior to an upgrade from a version previous to 5.0;
 - if an HSZ70 controller in a dual-redundant set, is replaced on a
   Tru64 UNIX 5.0 system.

Any necessary corrective action should be taken prior to the install
of/upgrade to Tru64 UNIX V5.


What if the system is already running V5 with mis-matched controllers?
----------------------------------------------------------------------
The first step is to correct the mis-matched names.  This means calling
field service and getting one of the mis-matched units replaced, or when
available (see below), upgrading the HSZ firmware so the names can be
matched.


In the following example, the list of known disk devices is displayed
using the "hwmgr -view devices -category disk" command.  The problem
devices on which to focus are the ones with the mis-matched PID (and
corresponding "Model") fields:

# hwmgr -view devices -category disk
 HWID: Device Name          Mfg      Model            Location
------------------------------------------------------------------------------
   27: /dev/disk/floppy0c            3.5in floppy     fdi0-unit-0
   34: /dev/disk/dsk0c      DEC      RZ28     (C) DEC bus-0-targ-1-lun-0
   35: /dev/disk/dsk1c      DEC      RZ26L    (C) DEC bus-0-targ-2-lun-0
   36: /dev/disk/cdrom0c    DEC      RRD44   (C) DEC  bus-0-targ-5-lun-0
   37: /dev/disk/dsk2c      DEC      HSZ70            bus-1-targ-3-lun-0
   38: /dev/disk/dsk3c      DEC      HSZ70            bus-1-targ-3-lun-1
   39: /dev/disk/dsk4c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-0
   40: /dev/disk/dsk5c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-1

Note that the HSZ "Model" field of dsk2c and dsk3c does not match the HSZ
Model of dsk4c and dsk5c.

To correct this situation you have 2 choices.

Choice 1:

    Reinstall V5 using this exact procedure:

        Upgrade HSZ firmware to achieve an exact PID match between
        the two controllers.

        At the console level, force devices to be named from scratch:

                P00>>> set bootdef_dev ""

        Boot Tru64 UNIX V5 installation media and install V5.


Choice 2:

    Perform this manual corrective procedure:

      - Upgrade the HSZ firmware to achieve PID match

      - Manually redirect the problematic scsi disks to new disk devices
        that will be created during a hardware scan executed after the
        PID match.

    The remainder of this article demonstrates an example of this
    manual procedure.


# hwmgr -view devices -category disk
(OR...)
# hwmgr -view dev -cat disk
 HWID: Device Name          Mfg      Model            Location
 ------------------------------------------------------------------------------
   27: /dev/disk/floppy0c            3.5in floppy     fdi0-unit-0
   34: /dev/disk/dsk0c      DEC      RZ28     (C) DEC bus-0-targ-1-lun-0
   35: /dev/disk/dsk1c      DEC      RZ26L    (C) DEC bus-0-targ-2-lun-0
   36: /dev/disk/cdrom0c    DEC      RRD44   (C) DEC  bus-0-targ-5-lun-0
   37: /dev/disk/dsk2c      DEC      HSZ70            bus-1-targ-3-lun-0
   38: /dev/disk/dsk3c      DEC      HSZ70            bus-1-targ-3-lun-1
   39: /dev/disk/dsk4c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-0
   40: /dev/disk/dsk5c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-1
# df
Filesystem       512-blocks        Used   Available Capacity  Mounted on
/dev/disk/dsk4a      338542      276436       28250    91%    /
/dev/disk/dsk4g     3389096      520056     2530130    18%    /usr
/proc                     0           0           0   100%    /proc


We will match the firmware of the HSZ controllers so the PID of both
controllers match the PID used by the controller of the system disk:
"HSZ70 (C) DEC".

Shut the system down to single-user mode.  This will prevent background
processes from "touching" the disks while they are being changed:

# shutdown now


Determine which HSZ controller is the "master" controller.  If the root
file system is on this HSZ pair, attach a terminal/CLI connection and
execute the show  command.  The output will determine to which
controller the root unit is "ONLINE":

HSZ> show unit
    LUN                                      Uses
--------------------------------------------------------------

  D300                                       DISK300
  D301                                       DISK320
  D600                                       DISK600
  D601                                       DISK630
HSZ> show d600
    LUN                                      Uses
--------------------------------------------------------------

  D600                                       DISK600
  ...
        State:
          ONLINE to this controller
          Not reserved
          PREFERRED_PATH = THIS_CONTROLLER
  ...
HSZ> show d300
    LUN                                      Uses
--------------------------------------------------------------

  D300                                       DISK300
  ...
        State:
          ONLINE to the other controller
          PREFERRED_PATH = OTHER_CONTROLLER
  ...

D600 should be on the master, therefore, stop the controller that is
serving the other units (unit D300 in our example) via the "set
nofailover" command.

In our example, from "HSZ>" issue "set nofailover". Then insert a card
having the new version of firmware into the controller slot that was
shutdown, then reboot the controller (when it restarts, you can disregard
the "Controllers misconfigured." error message) For more information
about updating your HSZ70 firmware, refer to the HSZ70 Configuration
Manual and the Release Notes.

Now switch the cli line to the other HSZ controller and, issue the "set
failover copy=other" command.  After the controller reboots, switch back
to the original controller, and issue the "shutdown" command, upgrade its
firmware, and allow the original HSZ controller to reboot.

Finally, from the host, issue "hwmgr -scan scsi" command. This will
update system information to find units with the "correct" PID field.


Summary:

Host commands           Master HSZ commands     "Other" HSZ commands
--------------------------------------------------------------------------
hwmgr -view dev
  [find mis-matched HSZ units]

shutdown now
  [determine which HSZ is the master]

                        set nofailover
                                                [upgrade to new F/W]
                                                set failover copy="other"
                        shutdown
                        [upgrade to new F/W]

hwmgr -scan scsi   [ -bus 1 ]

  A word about "hwmgr -scan scsi"...
  Depending on the size of your configuration, the scan may take several
  minutes to complete.  The presence of tape devices will further
  increase the delay to complete the scan.  For this reason, you may wish to
  use the -bus qualifier to specify the bus you want to scan.  The correct bus
  number can be determined by examining the "location" field of the hwmgr
  -view devices output:

# hwmgr -view devices
 HWID: Device Name          Mfg      Model            Location
 ------------------------------------------------------------------------------
...
   39: /dev/disk/dsk4c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-0

# hwmgr -scan scsi -bus 1

In order to understand which devices need to be redirected to the
newly-created devices, examine the current list of devices prior to
rebooting the system:

# hwmgr -view devices
 HWID: Device Name          Mfg      Model            Location
 ------------------------------------------------------------------------------
    4: /dev/kevm
   27: /dev/disk/floppy0c            3.5in floppy     fdi0-unit-0
   34: /dev/disk/dsk0c      DEC      RZ28     (C) DEC bus-0-targ-1-lun-0
   35: /dev/disk/dsk1c      DEC      RZ26L    (C) DEC bus-0-targ-2-lun-0
   36: /dev/disk/cdrom0c    DEC      RRD44   (C) DEC  bus-0-targ-5-lun-0
   37: /dev/disk/dsk2c      DEC      HSZ70            bus-1-targ-3-lun-0
   38: /dev/disk/dsk3c      DEC      HSZ70            bus-1-targ-3-lun-1
   39: /dev/disk/dsk4c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-0
   40: /dev/disk/dsk5c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-1
   41: /dev/cport/scp0               HSZ70            bus-1-targ-3-lun-0
   42: /dev/cport/scp1               HSZ70    (C) DEC bus-1-targ-6-lun-0
   44: /dev/disk/dsk6c      DEC      HSZ70    (C) DEC bus-1-targ-3-lun-0
   45: /dev/disk/dsk7c      DEC      HSZ70    (C) DEC bus-1-targ-3-lun-1


Note that you can ignore the control port "scp0" device.  We will need to
redirect the following device names (with invalid PID's)

   37: /dev/disk/dsk2c      DEC      HSZ70            bus-1-targ-3-lun-0
   38: /dev/disk/dsk3c      DEC      HSZ70            bus-1-targ-3-lun-1

...to the following new device names (containing valid PID's)

   44: /dev/disk/dsk6c      DEC      HSZ70    (C) DEC bus-1-targ-3-lun-0
   45: /dev/disk/dsk7c      DEC      HSZ70    (C) DEC bus-1-targ-3-lun-1

Remember the following:

    HWID 37 will be redirected to 44
    HWID 38 will be redirected to 45


At this point, you should reboot the system (ONLY TO SINGLE-USER MODE).
Then mount the root file system to enable writing to the disk:

# shutdown -h now
.
 .
  .
P00>>> boot -flag s dkb600
.
 .
  .
Starting secondary cpu 1

INIT: SINGLE-USER MODE
# mountroot
.
 .
  .

Examine the output of "hwmgr -view devices" and "hwmgr -show scsi".
(The scsi DID output will be necessary to execute the hwmgr -redirect
commands.)


# hwmgr -view dev -cat disk
 HWID: Device Name          Mfg      Model            Location
 ------------------------------------------------------------------------------
   27: /dev/disk/floppy0c            3.5in floppy     fdi0-unit-0
   34: /dev/disk/dsk0c      DEC      RZ28     (C) DEC bus-0-targ-1-lun-0
   35: /dev/disk/dsk1c      DEC      RZ26L    (C) DEC bus-0-targ-2-lun-0
   36: /dev/disk/cdrom0c    DEC      RRD44   (C) DEC  bus-0-targ-5-lun-0
   39: /dev/disk/dsk4c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-0
   40: /dev/disk/dsk5c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-1
   44: /dev/disk/dsk6c      DEC      HSZ70    (C) DEC bus-1-targ-3-lun-0
   45: /dev/disk/dsk7c      DEC      HSZ70    (C) DEC bus-1-targ-3-lun-1

# hwmgr -show scsi

       SCSI               DEVICE    DEVICE  DRIVER NUM  DEVICE FIRST
 HWID:DEVICEID HOSTNAME   TYPE      SUBTYPE OWNER  PATH FILE   VALID PATH
-------------------------------------------------------------------------
   34:  0        ajkitt     disk      none    0      1    dsk0   [0/1/0]
   35:  1        ajkitt     disk      none    0      1    dsk1   [0/2/0]
   36:  2        ajkitt     cdrom     none    0      1    cdrom0 [0/5/0]
   37:  3        ajkitt     disk      none    0      1    (null)
   38:  4        ajkitt     disk      none    0      1    (null)
   39:  5        ajkitt     disk      none    2      1    dsk4   [1/6/0]
   40:  6        ajkitt     disk      none    0      1    dsk5   [1/6/1]
   44:  7        ajkitt     disk      none    0      1    dsk6   [1/3/0]
   45:  8        ajkitt     disk      none    0      1    dsk7   [1/3/1]


Note the following correspondence, and recall our intentions:

 HWID = SCSI DID
 ----   --------
  37  =  3
  38  =  4
  44  =  7
  45  =  8

 REDIRECT
   HWID     SCSI DID
 --------   --------
 37 to 44    3 to 7
 38 to 45    4 to 8

The redirection is accomplished by the following hwmgr commands:

# hwmgr -redirect scsi -src 3 -dest 7
hwmgr: Redirect operation was successful

# hwmgr -redirect scsi -src 4 -dest 8
hwmgr: Redirect operation was successful


Final result, and proof that all devices are reachable:

# hwmgr -view devices
 HWID: Device Name          Mfg      Model            Location
 ------------------------------------------------------------------------------
    4: /dev/kevm
   27: /dev/disk/floppy0c            3.5in floppy     fdi0-unit-0
   34: /dev/disk/dsk0c      DEC      RZ28     (C) DEC bus-0-targ-1-lun-0
   35: /dev/disk/dsk1c      DEC      RZ26L    (C) DEC bus-0-targ-2-lun-0
   36: /dev/disk/cdrom0c    DEC      RRD44   (C) DEC  bus-0-targ-5-lun-0
   37: /dev/disk/dsk2c      DEC      HSZ70    (C) DEC bus-1-targ-3-lun-0
   38: /dev/disk/dsk3c      DEC      HSZ70    (C) DEC bus-1-targ-3-lun-1
   39: /dev/disk/dsk4c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-0
   40: /dev/disk/dsk5c      DEC      HSZ70    (C) DEC bus-1-targ-6-lun-1
   42: /dev/cport/scp1               HSZ70    (C) DEC bus-1-targ-3-lun-0
# hwmgr -show scsi

      SCSI                DEVICE    DEVICE  DRIVER NUM  DEVICE FIRST
 HWID:DEVICEID HOSTNAME   TYPE      SUBTYPE OWNER  PATH FILE   VALID PATH
-------------------------------------------------------------------------
   34:  0        ajkitt     disk      none    0      1    dsk0   [0/1/0]
   35:  1        ajkitt     disk      none    0      1    dsk1   [0/2/0]
   36:  2        ajkitt     cdrom     none    0      1    cdrom0 [0/5/0]
   37:  3        ajkitt     disk      none    0      1    dsk2   [1/3/0]
   38:  4        ajkitt     disk      none    0      1    dsk3   [1/3/1]
   39:  5        ajkitt     disk      none    2      1    dsk4   [1/6/0]
   40:  6        ajkitt     disk      none    0      1    dsk5   [1/6/1]

# mount /usr
# file /dev/rdisk/dsk*c
/dev/rdisk/dsk0c:       character special (19/22) SCSI #0 "RZ28" disk #0
(SCSI ID #1) (SCSI LUN #0)
/dev/rdisk/dsk1c:       character special (19/38) SCSI #0 "RZ26L" disk #1
(SCSI ID #2) (SCSI LUN #0)
/dev/rdisk/dsk2c:       character special (19/70) SCSI #1 "HSZ70" disk #3
(SCSI ID #3) (SCSI LUN #0)
/dev/rdisk/dsk3c:       character special (19/86) SCSI #1 "HSZ70" disk #4
(SCSI ID #3) (SCSI LUN #1)
/dev/rdisk/dsk4c:       character special (19/102) SCSI #1 "HSZ70" disk #5
(SCSI ID #6) (SCSI LUN #0)
/dev/rdisk/dsk5c:       character special (19/118) SCSI #1 "HSZ70" disk #6
(SCSI ID #6) (SCSI LUN #1)

The system is now ready for multi-user mode:

# ^D
INIT: New run level: 3
.
 .
  .
<>
     ---------------------------------------------------------------

                                FIRMWARE NOTE
                                -------------

        HSZ70 controller firmware HSOF V7.7 will include checks at boot
        time that will prevent the booting controller from entering a
        dual configuration (either transparent failover or multibus
        failover) if the product id fields of both the controllers are
        not the same. The controller will issue the following warning on
        the CLI prompt:

        Controllers misconfigured. - Type SHOW THIS_CONTROLLER

        The output of the "SHOW THIS_CONTROLLER" command will contain:

        Controller:
               HSZ70 ZG81110847 Firmware V77Z-0, Hardware  H01
               Configured for dual-redundancy with ZG71600468
               Controllers misconfigured -- product id mismatch, a
                    SET FAILOVER COPY= is required to re-synchronize
                    controllers

        When this command is followed by "SET NOFAILOVER" (or "SET
        NOMULTIBUS_FAILOVER" in the case of a multibus failover) and
        "SET FAILOVER COPY=THIS/OTHER" (or "SET MULTIBUS_FAILOVER"), the
        pair will synchronize the product id fields based on the source
        of the "COPY=THIS" or "COPY=OTHER" command. Note that the SET
        FAILOVER (or SET MULTIBUS_FAILOVER) command has to be issued from
        a controller that is running HSZ70 controller firmware HSOF V7.7.

        HSZ70 controller firmware HSOF V7.7 will be available in March,
        2000. Customers who signed the standard MDDS contract (Media and
        Documentation Distribution Service) will automatically receive
        the HSOF V7.7 firmware via "automatic update".  Non-contract
        customers may order the following kit, which is Tru64 UNIX
        specific and contains HSOF V7.7:

                               QB-5SBAB-MA.7.7




    *****************************< NOTE>********************************
    *                                                                  *
    * INFORMATION IN THIS DOCUMENT REPRESENTS OPERATIONAL EXPERIENCES  *
    * AND SUGGESTIONS BY COMPAQ OR PARTNER EMPLOYEES.  COMPAQ SHALL    *
    * NOT BE RESPONSIBLE FOR ANY ERRORS OR OMMISSIONS CONTAINED IN     *
    * THIS DOCUMENT, AND RESERVES THE RIGHT TO MAKE CHANGES TO IT      *
    * WITHOUT NOTICE.                                                  *
    *                                                                  *
    ********************************************************************

Files on this server are as follows:
»hsz_nam_issue.README
»hsz_nam_issue.CHKSUM
»hsz_nam_issue.tar
privacy statement using this site means you accept its terms