PROBLEM: (93957) (PATCH ID: OSF540-073) ******** New ddr tables provide device recognition support for our latest high end tape drive, the SDLT160/320. Without the tables density and compression setting would not be possible. The drive would operate as a 'generic' tape device. Customers would not see any benefit from purchasing the newest, faster, highest density drive we have available in the DLT product line. PROBLEM: (94953) (PATCH ID: OSF540-089) ******** This patch is necessary in order for Tru64 to recognize the Ultrium 2 tape drive. And device classes AIT and Travan which are planned for future suppot. PROBLEM: (95265) (PATCH ID: OSF540-233) ******** Updating the DDR database via the merge script causes duplicated parameters PROBLEM: (none) (PATCH ID: OSF540-232) ******** This patch enables rewind after reset behavior for the Ultrium 2 tape drive. Without it the tape will not rewind after a reset. PROBLEM: (92681, 94943) (PATCH ID: OSF540-122) ******** This patch fixes a possible system hang condition that could occur during Smart Array error recovery. If the configuration includes a Smart Array and a system hang occurs, apply this patch. PROBLEM: (POR) (PATCH ID: OSF540-131) ******** Prior to this change /usr/sbin/envmond did not issue EVM events to mark the transition of environmental sensor (PS, Fan & temp) sensor status changes. Theses updates will enable /usr/sbin/envmond to issue EVM events when environmental sensor change states. PROBLEM: (90277) (PATCH ID: OSF540-094) ******** This patch fixes a system panic, "PWS_CCB_QUE_REMOVE: ccb not on any list" caused by a device or bus reset occuring during the execution of a command to a media changer, like a tape library. Command retries occur to a media changer if a reset is detected after a successful open call to the media changer device completes. The problem is that an internal queue used for command retries becomes corrupted due to a synchronization issue between the media changer driver and common CAM code. PROBLEM: (92541) (PATCH ID: OSF540-077) ******** PROBLEM: (92541) (PATCH ID: ) If an attempt is made to open a raid device that is not attached to the system, a panic will result. This could happen if a controller was attached to a system, then removed and the system rebooted. PROBLEM: (94731) (PATCH ID: OSF540-138) ******** This patch adds an event which indicates that the soft or hard error count has changed on the device indentified in the event. PROBLEM: (91846) (PATCH ID: OSF540-022) ******** This patch fixes a situation where mounting a valid cdrom the first time fails with 'No valid filesystem exists on the partition' and subsequent mounts of the same cdrom works. For example: 1 eject CD 2 insert v5.1A ASSOCIATED PRODUCTS VOL 1 3 'mount -r /dev/disk/cdrom0c /mnt' works 4 'umount /mnt' 5 eject CD. Swap for 'Associated Products VOL 2' 6 'mount -r /dev/disk/cdrom0c /mnt' fails with 'No valid filesystem exists on this partition' 7 Trying step 6 again will complete fine. 8 Go to step 1. The problem is that code added to the generic cam disk driver to support size expansion and prevent size contraction of hard disks is interfering with removable media devices, such as cdroms and floppy disks. Thus as different sizes of cdroms get swapped, errant behaviour when mounting the cdrom drive can occur. PROBLEM: (IPMT, 93846) (PATCH ID: OSF540-004) ******** One: (STL327474) This patch would only be applicable if one is using tapes with programs which are not generally used for writing and reading tapes. In other words, regular tape backup software and programs like tar and cpio would have no need for this patch. Only "homegrown" or other programs which might write to tapes with one block size, and then read the same tape with a smaller block size, would have use for this patch. The patch provides a configurable setting which can be set to cause an error to be returned from any read from tape that requests fewer bytes of data than exist in the tape block being read. Two: (93846) If an application opens a device through the compression device special file and then closes and reopens the device through the non-compression device special file, this problem will occur, and the data will be written compressed. PROBLEM: (94943) (PATCH ID: OSF540-107) ******** This fix allows hardware event notifications by a Smart Array controller that occur on a system that is not booted to be logged into the binary.errlog when the system is booted. This is useful in diagnosing logical volume failures should it occur. PROBLEM: (94209, 85057, 93713, 94694, 82424) (PATCH ID: OSF540-074) ******** Fix for problem when reading /etc/ddr.dbase. -------------------------------------------- /sbin/ddr_config is a tool that reads the text file /etc/ddr.dbase and produces the binary Dynamic Device Recognition (DDR) database file /etc/ddr.db that the Tru64 Operating System reads to obtain device driver settings. One of the optional parameters for a device in /etc/ddr.dbase is ReadyTimeSeconds, it determines how long before an I/O request times out on that device. Some tape devices can take 300 seconds (5 minutes) or more to become ready. If the ReadyTimeSeconds value in /etc/ddr.dbase is larger than 255, /sbin/ddr_config will print an error message, refuse to accept it and will revert to the default value of 45 seconds. If you have a tape drive that takes a long time to get ready and you experience I/O timeouts, consider installing this patch. It contains a new ddr_config that allows ReadyTimeSeconds values up to 86400 seconds (24 hours). You may then edit /etc/ddr.dbase (or a copy of it) and increase ReadyTimeSeconds for your tape drive, for example: ReadyTimeSeconds = 300 Recompile /etc/ddr.dbase by issuing the command: ddr_config -c [filename] If you make sure that all devices of that type are unmounted and nobody is using any of them when you issue ddr_config, the change will take effect without rebooting your system. Fix for problem with NUMA Disk Statistics. ------------------------------------------ When a system with multiple CPU's issued I/O's to the same disk from more than one CPU, disk usage statistics sometimes showed more than 100% disk utilization. Contributions from different CPU's were not merged correctly. This fix makes the disk usage report correctly on multiple CPU systems, also known as NUMA systems. The problem was visible when using the table() system call or the "collect -sd" command. If you issue a table() call with the argument TBL_DKINFO, the data returned contains an element called di_time. The amount di_time increases between table() calls reflects how much a given disk has been in use. Please refer to the documentation or man pages for details. The command "collect -sd" uses the table() feature and presents the disk usage in the column labeled "%BSY" (percent busy), which should range from 0.00 to 100.00. Fix for problem with system crash during boot. ---------------------------------------------- If some nodes of a cluster are rebooted, and the nodes being rebooted has access to a quorum disk, then sometimes the node being rebooted or a up node crashes with kernel memory fault. Fix for misleading warning message. ----------------------------------- This patch fixes a problem in the CAM subsystem where it would print out "bad block number" to the error log on a recovered read error. The string has been changed to "block number." PROBLEM: (93770) (PATCH ID: OSF540-078) ******** PROBLEM: (93770) (PATCH ID: ) negative device IDs may be displayed during boot. PROBLEM: (88667) (PATCH ID: OSF540-093) ******** This patch fixes the reporting of device monitoring events and hardware errors during disk recovery from the disk driver to the binary errlog. The first problem is that the message, "Device monitoring events for Test Unit Ready CCB" is incorrectly getting logged into the binary errorlog. This message is intended to get logged only when a disk is not responding. It is currently getting logged under conditions when the disk has actually responded. This results in the saturation of errant device monitoring events into the binary errorlog. The second problem is that the disk recovery process misses an early opportunity to detect and report a nonrecoverable hardware error. Instead of immediately logging a "Recovery failed" message and stopping the recovery when a nonrecoverable hardware error is reported, the disk driver is logging a "Recovery progress event, this is NOT an error" information message that includes event information indicating a "HARDWARE ERROR - Nonrecoverable hardware error" event has occurred and continues with the recovery process. This results in a confusing binary errorlog message and the unnecessary continuation of a recovery process that will not succeed. PROBLEM: (91730) (PATCH ID: OSF540-083) ******** Using hwmgr to delete disk devices can improperly interrupt device scan operations. Most common symptom is scsi device appearing in system with hardware id of "0" and no device special file. PROBLEM: (94628) (PATCH ID: OSF540-235) ******** A path status change which occurs while in the process of opening an HSG80 disk device can cause a hang in the procesing of the disk label. A dump forced with the system in this state will show the hung thread waiting below calls through cdisk_online and cdisk_read_label. PROBLEM: (95115, 95116, 95117, 95118) (PATCH ID: OSF540-231) ******** Not applicable. PROBLEM: (92811, 95443) (PATCH ID: OSF540-302) ******** This problem caused a number of error conditions to be seen during HSZ and HSG failovers. These conditions include; ADVFS domain panics, kernel memory faults, and stalled IO. PROBLEM: (95596) (PATCH ID: OSF540-446) ******** This patch fixes a problem where the CAM I/O subsystem does not always zero the Cam Control Blocks which are used by the peripheral drivers. This can cause a kernel memory fault or system hang when the subsystem is low on memory. The CAM Control Blocks which are not zeroed are allocated from a look-aside list at elevated IPL when there is no memory available in the regular pool. PROBLEM: (95039) (PATCH ID: OSF540-367) ******** On HSG80,cdisk_handle_pr_ccb was returning failure without properly returning the reason code. This resulted in the same device being retried on several path. This fix will return proper error code in case of hardware failure, which will eliminate unnecessary retires by higher layer. The failure caseis unit attention with ascq = Oxf002. PROBLEM: (IT_G05939, CH_G05474, CH_G05475, BCGMC00BM, DEK088095, LU_G05635, DE_G05540, IT_G05434) (PATCH ID: OSF540-419) ******** The patch fixes a problem and prevents 'ccfg_MakeDeviceIdentWWID: Invalid device ID' messages from being generated when they should not be. PROBLEM: (95445, 95446, 95596, 95611) (PATCH ID: OSF540-396) ******** PROBLEM: A command timeout may occur due to the smart array driver losing a command completion. If the following error log entry is generated, this patch should be loaded. Sequence number of error: 633 Time of error entry: Sun Nov 24 19:39:10 2002 Host name: caninedelig SCSI CAM ERROR PACKET SCSI device class: CISS (Smart Array) Bus Number: 5 Target Number: 4 Lun Number: 0 Routine name that logged the event: ciss_cmd_timeout Event information: Command timed out...resetting controller Event information: Active CCB at time of error PROBLEM: A disklabel command can fail if the smart array controller is in error recovery when the command is executed. If a disklabel command fails and the following error log entry is generated, this patch should be loaded. Sequence number of error: 673 Time of error entry: Sun Nov 24 19:39:12 2002 Host name: caninedelig SCSI CAM ERROR PACKET SCSI device class: DISK Bus Number: 5 Target Number: 2 Lun Number: 0 Routine name that logged the event: cdisk_online Event information: ccmn_path_setup3 has reported no viable paths Hardware detected event: Hard Error Detected Event information: Hardware ID = 86 Device Name: COMPAQ LOGICAL VOLUME 2.94 PROBLEM: Kernel Memory Fault. If the following stack trace is seen in the crash dump, this patch should be loaded. 1 panic 2 event_timeout 3 printf 4 panic(s = 0xfffffc0000bb0be0 = "kernel memory fault" 5 trap 6 _XentMM 7 dma_get_private 8 ciss_append_handle 9 ciss_map_data PROBLEM: Kernel Memory Fault. If the following stack trace is seen in the crash dump, this patch should be loaded. 1 panic 2 event_timeout 3 printf 4 panic(s = 0xfffffc0000bb0be0 = "kernel memory fault" 5 trap 6 _XentMM 7 ciss_ReportLogLUN 8 xpt_callback_thread PROBLEM: (95337) (PATCH ID: OSF540-417) ******** This problem causes a kernel memory fault with a stack trace that includes ctape_ioctl or ctape_generic_passthru and partially or non-initialized translation or path structure. PROBLEM: (92811, 95443) (PATCH ID: OSF540-388) ******** This problem caused a number of error conditions to be seen during HSZ and HSG failovers. These conditions include; ADVFS domain panics, kernel memory faults, and stalled IO. PROBLEM: (93290, CDISK_ATTR_CHANGE_HANDLER()) (PATCH ID: OSF540-325) ******** This patch fixes a small memory leak in Power Management. The leak was minor and was not likely to affect regular users. PROBLEM: (95684) (PATCH ID: OSF540-447) ******** PROBLEM: Panic CISS_STARTIO: JOB STRUCTUER SHOULD HAVE BEEN AVAILIBLE. If the following stack trace is seen in the crash dump, this patch should be loaded. 0 boot 1 panic 2 ciss_startio 3 ciss_scsiio 4 ciss_proc_deferred_list 5 ciss_deferred_ccb_thread PROBLEM: (79336) (PATCH ID: OSF540-425) ******** The BBR code used to log all error messages as soft error, even is the error was not recovered and it failed to do the bad block replacement. This fix will not log hard errors as hard. PROBLEM: (95048) (PATCH ID: OSF540-422) ******** The patch fixes a process hang problem in ubc_common_lookup. Sometime, the process may loop forever trying to retry IO on a failed disk, when it receives "Not Ready" message. The fix will allow it to fail the io after few retries. PROBLEM: (95158) (PATCH ID: OSF540-295) ******** This is a fix for a problem where Smart Array 5300 Logical Volumes were counted as RAID controllers. The problem affected performance of system management utilites and could generate confusing error messages seen by system managers. Regular users were not affected by this problem. PROBLEM: (86997) (PATCH ID: OSF540-287) ******** Under certain conditions a devices inquiry data would be over written with 0's. This would cause the following error message to be written to the logs 'DDR - Warning: Device has no "name"'. On devices such as the HSG80 this could also result in I/O stalls, application errors, and possible system hangs. PROBLEM: (95684, 95747) (PATCH ID: OSF540-452) ******** PROBLEM: Panic CISS_STARTIO: JOB STRUCTURE SHOULD HAVE BEEN AVAILIBLE. If the following stack trace is seen in the crash dump, this patch should be loaded. 0 boot 1 panic 2 ciss_startio 3 ciss_scsiio 4 ciss_proc_deferred_list 5 ciss_deferred_ccb_thread PROBLEM: (90640) (PATCH ID: OSF540-486) ******** This problem causes tape devices that do not respond, possibly resulting in hung user tasks, possibly requiring a reboot to restore, or possibly causing errors in tape device access. PROBLEM: (95977) (PATCH ID: OSF540-565) ******** This patch fixes a situation in which a system with a SA5300 controller can experience a system hang or machine check crash dump when recovering from a controller lockup error or command timeout. PROBLEM: (95815, 94560, 94529) (PATCH ID: OSF540-561) ******** In certain circumstances an IO may become permanently stalled if a previous close to that device occurred when the queue was stalled.