SEARCH CONTACT US SUPPORT SERVICES PRODUCTS STORE
United States    
COMPAQ STORE | PRODUCTS | SERVICES | SUPPORT | CONTACT US | SEARCH
gears
compaq support options
support home
software & drivers
ask Compaq
reference library
support forum
frequently asked questions
support tools
warranty information
service centers
contact support
product resources
parts for your system
give us feedback
associated links
.
} what's new
.
} contract access
.
} browse patch tree
.
} search patches
.
} join mailing list
.
} feedback
.
patches by topic
.
} DOS
.
} OpenVMS
.
} Security
.
} Tru64 Unix
.
} Ultrix 32
.
} Windows
.
} Windows NT
.
connection tools
.
} nameserver lookup
.
} traceroute
.
} ping
HSZ_NAM_ISSUE Dual Redundant HSZ Naming Issue in Tru64 UNIX V5.0

TITLE: HSZ_NAM_ISSUE Dual Redundant HSZ Naming Issue in Tru64 UNIX V5.0 Copyright 2000 Compaq Computer Corporation. All rights reserved DATE: 3 February 2000 PRODUCT: Tru64 UNIX V5.0 SOURCE: Compaq Computer Corporation TITLE: Dual Redundant HSZ naming issue; Tru64 UNIX V5.0 or later; Impacts fail-over function. ================================================================= PRODUCT NAME(S) IMPACTED: PRODUCT FAMILY(IES): PRODUCT NUMBERS: Storage _X_ HSZ70 Systems _X_ Alpha Networks ___ PC ___ ________________ Software _X_ Tru64 UNIX 5.0 Other (specify) ___ ________________ PROBLEM STATEMENT: ================== The Tru64 UNIX V5.0 software uses the product identifier (PID) field provided by HSZ controllers to assist in uniquely identifying logical disk units. If dual controllers with different PID fields are present, then the logical units will not be properly identified as the same units, and controller failover will not occur. The hardware management software [hwmgr(8a)] will not properly recognize the unit after the failover, and all access attempts to the unit will fail. In addition, since after the failover attempt, hwmgr will see the unit as a different logical unit, a new device name will exist for that logical unit. In effect, there will be 2 device names for each logical device (1 name for each of the 2 controllers in the redundant pair). The worst impact of this problem is that it can remain undetected until an error occurs that would generate a controller failover. At the crucial time when a failover is needed and expected, it will not work. In HSZ controller firmware HSOF V7.7, this problem will be corrected by ensuring that the PID fields are synchronized. For more details, refer to the "FIRMWARE NOTE" below. Background Info: All SCSI devices contain what is known as a vendor identifier (VID) and a product identifier (PID). These identifiers are installed in the device during manufacturing. For dual redundant HSZ controllers to operate correctly with Tru64 UNIX 5.0, the VID and PID contained in each of the 2 controllers in the set must match. It is possible however, that some controllers may have different PID fields. For example: "HSZ70" versus "HSZ70 (C) DEC" CONFIGURATIONS AFFECTED: ======================== Tru64 UNIX Version 5.0 or later systems with Dual Redundant HSZ70 controllers with firmware version less than HSOF V7.7. (See Firmware Note at the end of this article.) PROBLEM SYMPTOM: ================ Controller failover will not complete and after failover attempt, all access attempts to the unit will fail. PROBLEM SOLUTION: ================= Due to the severe consequences of this problem (fail-over inoperative) and the "invisible" nature (you won't know about it until fail-over is attempted), we recommend a pro-active approach to this problem. Field personnel should check existing Tru64 UNIX V5 installations and take necessary corrective action as described in this article. Tru64/Digital UNIX installations that are to be upgraded to V5 should be examined for this issue as a part of upgrade planning. How to determine the PID field value for HSZ units: --------------------------------------------------- The PID can be determined in several ways. A local terminal can be attached to the HSZ maintenance terminal port, or the CLI window of SWCC can be used. The "show this" and "show other" commands will display the PID field on the first line of the output. The following example illustrates a case of mis-matched PID values: HSZ> show this Controller: HSZ70 (C) DEC ZG41400123 Firmware V25Z-0, Hardware A02 . . . HSZ> show other Controller: HSZ70 ZG41800340 Firmware V25Z-0, Hardware 0000 . . . (Note "HSZ70 (C) DEC" vs. "HSZ70") An alternate method of examining the PID field is to use the scu command from the host: > scu scu> set nexus bus a target b lun c scu> show inq <....> Product Identification: HSZ70 (vs.) Product Identification: HSZ70 (C) DEC scu> When using scu, a preferred target from each HSZ controller must be examined when determining if they use the same PID field. What to do: ----------- If the PID fields for each redundant pair is the same, then no action is required. If the PID fields for a pair is found to be different, then replace one of the dual controllers so that a match can be obtained. Which name is matched does not matter, simply that they are match exactly. When to perform the check: --------------------- The check should be performed: - prior to the installation of the Tru64 UNIX 5.0 software; - prior to an upgrade from a version previous to 5.0; - if an HSZ70 controller in a dual-redundant set, is replaced on a Tru64 UNIX 5.0 system. Any necessary corrective action should be taken prior to the install of/upgrade to Tru64 UNIX V5. What if the system is already running V5 with mis-matched controllers? ---------------------------------------------------------------------- The first step is to correct the mis-matched names. This means calling field service and getting one of the mis-matched units replaced, or when available (see below), upgrading the HSZ firmware so the names can be matched. In the following example, the list of known disk devices is displayed using the "hwmgr -view devices -category disk" command. The problem devices on which to focus are the ones with the mis-matched PID (and corresponding "Model") fields: # hwmgr -view devices -category disk HWID: Device Name Mfg Model Location ------------------------------------------------------------------------------ 27: /dev/disk/floppy0c 3.5in floppy fdi0-unit-0 34: /dev/disk/dsk0c DEC RZ28 (C) DEC bus-0-targ-1-lun-0 35: /dev/disk/dsk1c DEC RZ26L (C) DEC bus-0-targ-2-lun-0 36: /dev/disk/cdrom0c DEC RRD44 (C) DEC bus-0-targ-5-lun-0 37: /dev/disk/dsk2c DEC HSZ70 bus-1-targ-3-lun-0 38: /dev/disk/dsk3c DEC HSZ70 bus-1-targ-3-lun-1 39: /dev/disk/dsk4c DEC HSZ70 (C) DEC bus-1-targ-6-lun-0 40: /dev/disk/dsk5c DEC HSZ70 (C) DEC bus-1-targ-6-lun-1 Note that the HSZ "Model" field of dsk2c and dsk3c does not match the HSZ Model of dsk4c and dsk5c. To correct this situation you have 2 choices. Choice 1: Reinstall V5 using this exact procedure: Upgrade HSZ firmware to achieve an exact PID match between the two controllers. At the console level, force devices to be named from scratch: P00>>> set bootdef_dev "" Boot Tru64 UNIX V5 installation media and install V5. Choice 2: Perform this manual corrective procedure: - Upgrade the HSZ firmware to achieve PID match - Manually redirect the problematic scsi disks to new disk devices that will be created during a hardware scan executed after the PID match. The remainder of this article demonstrates an example of this manual procedure. # hwmgr -view devices -category disk (OR...) # hwmgr -view dev -cat disk HWID: Device Name Mfg Model Location ------------------------------------------------------------------------------ 27: /dev/disk/floppy0c 3.5in floppy fdi0-unit-0 34: /dev/disk/dsk0c DEC RZ28 (C) DEC bus-0-targ-1-lun-0 35: /dev/disk/dsk1c DEC RZ26L (C) DEC bus-0-targ-2-lun-0 36: /dev/disk/cdrom0c DEC RRD44 (C) DEC bus-0-targ-5-lun-0 37: /dev/disk/dsk2c DEC HSZ70 bus-1-targ-3-lun-0 38: /dev/disk/dsk3c DEC HSZ70 bus-1-targ-3-lun-1 39: /dev/disk/dsk4c DEC HSZ70 (C) DEC bus-1-targ-6-lun-0 40: /dev/disk/dsk5c DEC HSZ70 (C) DEC bus-1-targ-6-lun-1 # df Filesystem 512-blocks Used Available Capacity Mounted on /dev/disk/dsk4a 338542 276436 28250 91% / /dev/disk/dsk4g 3389096 520056 2530130 18% /usr /proc 0 0 0 100% /proc We will match the firmware of the HSZ controllers so the PID of both controllers match the PID used by the controller of the system disk: "HSZ70 (C) DEC". Shut the system down to single-user mode. This will prevent background processes from "touching" the disks while they are being changed: # shutdown now Determine which HSZ controller is the "master" controller. If the root file system is on this HSZ pair, attach a terminal/CLI connection and execute the show command. The output will determine to which controller the root unit is "ONLINE": HSZ> show unit LUN Uses -------------------------------------------------------------- D300 DISK300 D301 DISK320 D600 DISK600 D601 DISK630 HSZ> show d600 LUN Uses -------------------------------------------------------------- D600 DISK600 ... State: ONLINE to this controller Not reserved PREFERRED_PATH = THIS_CONTROLLER ... HSZ> show d300 LUN Uses -------------------------------------------------------------- D300 DISK300 ... State: ONLINE to the other controller PREFERRED_PATH = OTHER_CONTROLLER ... D600 should be on the master, therefore, stop the controller that is serving the other units (unit D300 in our example) via the "set nofailover" command. In our example, from "HSZ>" issue "set nofailover". Then insert a card having the new version of firmware into the controller slot that was shutdown, then reboot the controller (when it restarts, you can disregard the "Controllers misconfigured." error message) For more information about updating your HSZ70 firmware, refer to the HSZ70 Configuration Manual and the Release Notes. Now switch the cli line to the other HSZ controller and, issue the "set failover copy=other" command. After the controller reboots, switch back to the original controller, and issue the "shutdown" command, upgrade its firmware, and allow the original HSZ controller to reboot. Finally, from the host, issue "hwmgr -scan scsi" command. This will update system information to find units with the "correct" PID field. Summary: Host commands Master HSZ commands "Other" HSZ commands -------------------------------------------------------------------------- hwmgr -view dev [find mis-matched HSZ units] shutdown now [determine which HSZ is the master] set nofailover [upgrade to new F/W] set failover copy="other" shutdown [upgrade to new F/W] hwmgr -scan scsi [ -bus 1 ] A word about "hwmgr -scan scsi"... Depending on the size of your configuration, the scan may take several minutes to complete. The presence of tape devices will further increase the delay to complete the scan. For this reason, you may wish to use the -bus qualifier to specify the bus you want to scan. The correct bus number can be determined by examining the "location" field of the hwmgr -view devices output: # hwmgr -view devices HWID: Device Name Mfg Model Location ------------------------------------------------------------------------------ ... 39: /dev/disk/dsk4c DEC HSZ70 (C) DEC bus-1-targ-6-lun-0 # hwmgr -scan scsi -bus 1 In order to understand which devices need to be redirected to the newly-created devices, examine the current list of devices prior to rebooting the system: # hwmgr -view devices HWID: Device Name Mfg Model Location ------------------------------------------------------------------------------ 4: /dev/kevm 27: /dev/disk/floppy0c 3.5in floppy fdi0-unit-0 34: /dev/disk/dsk0c DEC RZ28 (C) DEC bus-0-targ-1-lun-0 35: /dev/disk/dsk1c DEC RZ26L (C) DEC bus-0-targ-2-lun-0 36: /dev/disk/cdrom0c DEC RRD44 (C) DEC bus-0-targ-5-lun-0 37: /dev/disk/dsk2c DEC HSZ70 bus-1-targ-3-lun-0 38: /dev/disk/dsk3c DEC HSZ70 bus-1-targ-3-lun-1 39: /dev/disk/dsk4c DEC HSZ70 (C) DEC bus-1-targ-6-lun-0 40: /dev/disk/dsk5c DEC HSZ70 (C) DEC bus-1-targ-6-lun-1 41: /dev/cport/scp0 HSZ70 bus-1-targ-3-lun-0 42: /dev/cport/scp1 HSZ70 (C) DEC bus-1-targ-6-lun-0 44: /dev/disk/dsk6c DEC HSZ70 (C) DEC bus-1-targ-3-lun-0 45: /dev/disk/dsk7c DEC HSZ70 (C) DEC bus-1-targ-3-lun-1 Note that you can ignore the control port "scp0" device. We will need to redirect the following device names (with invalid PID's) 37: /dev/disk/dsk2c DEC HSZ70 bus-1-targ-3-lun-0 38: /dev/disk/dsk3c DEC HSZ70 bus-1-targ-3-lun-1 ...to the following new device names (containing valid PID's) 44: /dev/disk/dsk6c DEC HSZ70 (C) DEC bus-1-targ-3-lun-0 45: /dev/disk/dsk7c DEC HSZ70 (C) DEC bus-1-targ-3-lun-1 Remember the following: HWID 37 will be redirected to 44 HWID 38 will be redirected to 45 At this point, you should reboot the system (ONLY TO SINGLE-USER MODE). Then mount the root file system to enable writing to the disk: # shutdown -h now . . . P00>>> boot -flag s dkb600 . . . Starting secondary cpu 1 INIT: SINGLE-USER MODE # mountroot . . . Examine the output of "hwmgr -view devices" and "hwmgr -show scsi". (The scsi DID output will be necessary to execute the hwmgr -redirect commands.) # hwmgr -view dev -cat disk HWID: Device Name Mfg Model Location ------------------------------------------------------------------------------ 27: /dev/disk/floppy0c 3.5in floppy fdi0-unit-0 34: /dev/disk/dsk0c DEC RZ28 (C) DEC bus-0-targ-1-lun-0 35: /dev/disk/dsk1c DEC RZ26L (C) DEC bus-0-targ-2-lun-0 36: /dev/disk/cdrom0c DEC RRD44 (C) DEC bus-0-targ-5-lun-0 39: /dev/disk/dsk4c DEC HSZ70 (C) DEC bus-1-targ-6-lun-0 40: /dev/disk/dsk5c DEC HSZ70 (C) DEC bus-1-targ-6-lun-1 44: /dev/disk/dsk6c DEC HSZ70 (C) DEC bus-1-targ-3-lun-0 45: /dev/disk/dsk7c DEC HSZ70 (C) DEC bus-1-targ-3-lun-1 # hwmgr -show scsi SCSI DEVICE DEVICE DRIVER NUM DEVICE FIRST HWID:DEVICEID HOSTNAME TYPE SUBTYPE OWNER PATH FILE VALID PATH ------------------------------------------------------------------------- 34: 0 ajkitt disk none 0 1 dsk0 [0/1/0] 35: 1 ajkitt disk none 0 1 dsk1 [0/2/0] 36: 2 ajkitt cdrom none 0 1 cdrom0 [0/5/0] 37: 3 ajkitt disk none 0 1 (null) 38: 4 ajkitt disk none 0 1 (null) 39: 5 ajkitt disk none 2 1 dsk4 [1/6/0] 40: 6 ajkitt disk none 0 1 dsk5 [1/6/1] 44: 7 ajkitt disk none 0 1 dsk6 [1/3/0] 45: 8 ajkitt disk none 0 1 dsk7 [1/3/1] Note the following correspondence, and recall our intentions: HWID = SCSI DID ---- -------- 37 = 3 38 = 4 44 = 7 45 = 8 REDIRECT HWID SCSI DID -------- -------- 37 to 44 3 to 7 38 to 45 4 to 8 The redirection is accomplished by the following hwmgr commands: # hwmgr -redirect scsi -src 3 -dest 7 hwmgr: Redirect operation was successful # hwmgr -redirect scsi -src 4 -dest 8 hwmgr: Redirect operation was successful Final result, and proof that all devices are reachable: # hwmgr -view devices HWID: Device Name Mfg Model Location ------------------------------------------------------------------------------ 4: /dev/kevm 27: /dev/disk/floppy0c 3.5in floppy fdi0-unit-0 34: /dev/disk/dsk0c DEC RZ28 (C) DEC bus-0-targ-1-lun-0 35: /dev/disk/dsk1c DEC RZ26L (C) DEC bus-0-targ-2-lun-0 36: /dev/disk/cdrom0c DEC RRD44 (C) DEC bus-0-targ-5-lun-0 37: /dev/disk/dsk2c DEC HSZ70 (C) DEC bus-1-targ-3-lun-0 38: /dev/disk/dsk3c DEC HSZ70 (C) DEC bus-1-targ-3-lun-1 39: /dev/disk/dsk4c DEC HSZ70 (C) DEC bus-1-targ-6-lun-0 40: /dev/disk/dsk5c DEC HSZ70 (C) DEC bus-1-targ-6-lun-1 42: /dev/cport/scp1 HSZ70 (C) DEC bus-1-targ-3-lun-0 # hwmgr -show scsi SCSI DEVICE DEVICE DRIVER NUM DEVICE FIRST HWID:DEVICEID HOSTNAME TYPE SUBTYPE OWNER PATH FILE VALID PATH ------------------------------------------------------------------------- 34: 0 ajkitt disk none 0 1 dsk0 [0/1/0] 35: 1 ajkitt disk none 0 1 dsk1 [0/2/0] 36: 2 ajkitt cdrom none 0 1 cdrom0 [0/5/0] 37: 3 ajkitt disk none 0 1 dsk2 [1/3/0] 38: 4 ajkitt disk none 0 1 dsk3 [1/3/1] 39: 5 ajkitt disk none 2 1 dsk4 [1/6/0] 40: 6 ajkitt disk none 0 1 dsk5 [1/6/1] # mount /usr # file /dev/rdisk/dsk*c /dev/rdisk/dsk0c: character special (19/22) SCSI #0 "RZ28" disk #0 (SCSI ID #1) (SCSI LUN #0) /dev/rdisk/dsk1c: character special (19/38) SCSI #0 "RZ26L" disk #1 (SCSI ID #2) (SCSI LUN #0) /dev/rdisk/dsk2c: character special (19/70) SCSI #1 "HSZ70" disk #3 (SCSI ID #3) (SCSI LUN #0) /dev/rdisk/dsk3c: character special (19/86) SCSI #1 "HSZ70" disk #4 (SCSI ID #3) (SCSI LUN #1) /dev/rdisk/dsk4c: character special (19/102) SCSI #1 "HSZ70" disk #5 (SCSI ID #6) (SCSI LUN #0) /dev/rdisk/dsk5c: character special (19/118) SCSI #1 "HSZ70" disk #6 (SCSI ID #6) (SCSI LUN #1) The system is now ready for multi-user mode: # ^D INIT: New run level: 3 . . . <> --------------------------------------------------------------- FIRMWARE NOTE ------------- HSZ70 controller firmware HSOF V7.7 will include checks at boot time that will prevent the booting controller from entering a dual configuration (either transparent failover or multibus failover) if the product id fields of both the controllers are not the same. The controller will issue the following warning on the CLI prompt: Controllers misconfigured. - Type SHOW THIS_CONTROLLER The output of the "SHOW THIS_CONTROLLER" command will contain: Controller: HSZ70 ZG81110847 Firmware V77Z-0, Hardware H01 Configured for dual-redundancy with ZG71600468 Controllers misconfigured -- product id mismatch, a SET FAILOVER COPY= is required to re-synchronize controllers When this command is followed by "SET NOFAILOVER" (or "SET NOMULTIBUS_FAILOVER" in the case of a multibus failover) and "SET FAILOVER COPY=THIS/OTHER" (or "SET MULTIBUS_FAILOVER"), the pair will synchronize the product id fields based on the source of the "COPY=THIS" or "COPY=OTHER" command. Note that the SET FAILOVER (or SET MULTIBUS_FAILOVER) command has to be issued from a controller that is running HSZ70 controller firmware HSOF V7.7. HSZ70 controller firmware HSOF V7.7 will be available in March, 2000. Customers who signed the standard MDDS contract (Media and Documentation Distribution Service) will automatically receive the HSOF V7.7 firmware via "automatic update". Non-contract customers may order the following kit, which is Tru64 UNIX specific and contains HSOF V7.7: QB-5SBAB-MA.7.7 *****************************< NOTE>******************************** * * * INFORMATION IN THIS DOCUMENT REPRESENTS OPERATIONAL EXPERIENCES * * AND SUGGESTIONS BY COMPAQ OR PARTNER EMPLOYEES. COMPAQ SHALL * * NOT BE RESPONSIBLE FOR ANY ERRORS OR OMMISSIONS CONTAINED IN * * THIS DOCUMENT, AND RESERVES THE RIGHT TO MAKE CHANGES TO IT * * WITHOUT NOTICE. * * * ********************************************************************



This patch can be found at any of these sites:

Colorado Site
Georgia Site



Files on this server are as follows:

hsz_nam_issue.README
hsz_nam_issue.CHKSUM
hsz_nam_issue.tar

privacy and legal statement