PROBLEM: (94166, 93923, 94643) (PATCH ID: TCR540-002) ******** Problems fixed by this patch include: - a hang which occurs when multiple nodes are shutting down together, and there are server-only filesystems mounted. In this situation, it is possible that some nodes will enter retry logic which will never end. This will occur far enough into the system shutdown processing so that the node will generally be unusable, but before the "syncing disks..." is printed to the console. - a potential panic in the Cluster File System which can occur when using raw Asynchronous I/O. When the problem occurs, the symptom will be a locking violation panic with the following string: "mcs_unlock: current lock not found" and a stack trace ending in either cfs_condio_iodone() or cfs_condio_issue_io(), such as: 4 panic src/kernel/bsd/subr_prf.c : 1309 5 simple_lock_fault src/kernel/kern/lock.c : 2805 6 mcs_unlock_found_violation src/kernel/kern/lock.c : 3142 7 cfs_condio_iodone src/kernel/tnc_common/tnc_cfe/cfs_directio.c : 870 8 biodone src/kernel/vfs/vfs_bio.c : 1682 9 volkiodone src/kernel/lsm/dec/kiosubr.c : 235 10 volsiodone src/kernel/lsm/common/siosubr.c : 358 11 vol_mv_write_done src/kernel/lsm/common/mvio.c : 3596 12 voliod_iohandle src/kernel/lsm/common/iod.c : 569 13 voliod_loop src/kernel/lsm/common/iod.c : 372 In addition, this patch includes some data validation to code which encodes/decodes token messages in the cluster, in order to assist in problem isolation and diagnosis. PROBLEM: (94069, 93505) (PATCH ID: TCR540-004) ******** This patch relieves pressure on the CMS global DLM lock by allowing AutoFS auto-UNmounts to back off when their lock requests are not granted within a reasonable amount of time. This can help avoid turning a transient slowdown into one which is more persistent. PROBLEM: (94199) (PATCH ID: TCR540-022) ******** This patch addresses a problem when a file is removed on a node that is not the CFS server for the filesystem. The attributes for the directory were not updated on the CFS server, and hence the attributes returned by the NFS server would not be updated. This behavior can cause NFS clients to erroneously continue to apply cached lookup data since the directory had not changed in their view, leading to stale file handle errors, when a similar situation on a single-system server would not. PROBLEM: (93126, 93724) (PATCH ID: TCR540-003) ******** Excessive FIDS lock contention is observed when large number of files using system based file locking. Result from "lockinfo -sort=misses -d 20 -f 200 -p 25 -l 20" will shows at the top of the list with a high miss rate. PROBLEM: (94314) (PATCH ID: TCR540-027) ******** This patch fixes a cluster deadlock that may occur during failover and recovery when direct I/O is in use. PROBLEM: (95288) (PATCH ID: TCR540-031) ******** This patch prevents a panic due to "simple_lock: uninitialized lock" during bootup. Code that was previously added to help diagnose an infrequent problem with filesystem messages passed between cluster nodes, might cause this panic between the point at which the node joins the cluster at the CNX level, and the completion of the code that establishes the node's filesystem state as part of the cluster (global variable cfs_set_join_completed is still 0). A typical stack trace is : 0 boot 1 panic 2 simple_lock_fault 3 simple_lock_valid_violation 4 ckidtokgs 5 check_cfs_infs 6 xdr_cfs_infs 7 xdr_cfswriteargs 8 xdr_reference 9 xdr_pointer 10 xdr_cfswriteargs_p 11 icsxdr_decode 12 icssvr_decode_xdr 13 svr_rcfs_write 14 icssvr_daemon_from_pool PROBLEM: (95359) (PATCH ID: TCR540-033) ******** This fixes a panic when an AutoFS file system is auto-unmounted. PROBLEM: (95221) (PATCH ID: TCR540-048) ******** This patch provides cluster file system performance enhancement when using file locks to coordinate file access. PROBLEM: (76137, 77079, 78667, 84066, 84540, 92595, 92853, 86050, 71400, 94135, 94429, 91495, 93701, 95235) (PATCH ID: TCR540-047) ******** Problem 1: This is characterized by a default configuration that has a single point of failure. This patch should be installed before cluster creation in order to create be given defaults that have a better vote configuration for reliability. Problem 2: The clu_rolls_ver_lookup_pid has been set to hidden so that the customer can not accidently change this. This is used *only* during a rolling upgrade and should not be set by the customer. Problem 3: The use of hwmgr -view hier in the cluster installation scripts can lead to misconfigured clusters where the customer has renamed devices using the view command. This should be patched before installation. Problem 4: This patch will alert the user that CAA is not licensed and pass the cluster check when caad is not running and the system doesn't have a cluster license. The current method of failing the cluster config check due because of the missing license has caused confusion with customers. Problem 5: When entering network adapters during the clu_create phase, it was possible to arrive at a loop where the user had mistakenly told clu_create that a netrain device was wanted, and not be able to undo the choice. Now an option is available to break out of the netrain entering section. Problem 6: Currently clu_create ignores clu_genvmunix when calculating free space required for cluster_user. This has cause installation to fail several times because clu_create is unable to copy clu_genvmunix to the user partition. Problem 7: Clu_add_member would fail with no additional information other than a "bad configuration" message to the user if it found a system with the same ics name on the network. It now alerts the user to the current state of affairs. Problem 8: When language was set to en_US.ISO8859-1 shutdown -csh core dumps. This was due to a bad string descriptor. Problem 9: After installing a member, the ifaccess.conf file will filter on the ics0 device. This has been changed to *not* filter the cluster interconnect as is correct. Problem 10: Clu_create did not properly check for overlapping partitions or placing partitions on the same disk during install, both of which are illegal installs. This patch fixes that error. Problem 11: Clu_delete_member left extra entries in /etc/cfgmgr.auth upon deletion of a member. This patch fixes that behaviour. PROBLEM: (94158) (PATCH ID: TCR540-049) ******** The existing freezefs displays an incorrect error message of "XXX is frozen" when "freezefs -q" is executed on a non-AdvFS filesystem XXX. This patch corrects this behavior by displaying "freezefs XXX: Function not implemented" instead. PROBLEM: (95033) (PATCH ID: TCR540-032) ******** The performance of write-appends to an external NFS server will be poor when the cluster node doing the writes is the CFS server for the NFS client mountpoint, and the file is opened with O_APPEND. This occurs, for example, when using '>>' from within ksh, or bash. It does not occur from within sh or csh. PROBLEM: (95593) (PATCH ID: TCR540-045) ******** This patch closes a timing window during asynchronous reads on a CFS client node which may lead to kernel memory corruption. PROBLEM: (94852) (PATCH ID: TCR540-040) ******** With this patch, cfsmgr now properly returns a failure status when a relocation request has failed. PROBLEM: (95341) (PATCH ID: TCR540-041) ******** This patch fixes a race condition where stale name cache entries allow file access after file unlink. PROBLEM: (95434) (PATCH ID: TCR540-046) ******** This patch fixes a panic that may occur during an unmount. In particular, the panic may occur when there are competing unmount processes, including autofs unmount processes. Here is a typical stack trace: 0 boot 1 panic 2 trap 3 _XentMM 4 cfs_mount_type 5 cfs_send_unmount_rpc 6 cfs_dounmount 7 cluster_unmount 8 unmount 9 syscall 10 _Xsyscall PROBLEM: (95711) (PATCH ID: TCR540-059) ******** This patch fixes an internal problem in the kernel's advfs, ufs and nfs filesystems where extended attributes with extremely long names, greater than 247 characters, could not be set on files. The new limit is 254 + a Null string terminator. PROBLEM: (95633) (PATCH ID: TCR540-058) ******** This patch corrects a problem where a CFS lookup for a mount could leave stale state behind that could adversely affect subsequent NFS operations. This stale state could result in a situation on an NFS client where a hard mounted NFS filesystem can incur ETIMEDOUT errors. This situation can occur when a cluster is accessing a filesystem on a server that the cluster thinks is down (either when the server is truly down or temporarily appears down because it is busy). PROBLEM: (DE_G04593) (PATCH ID: TCR540-018) ******** This addresses an issue with ICS overloading rad 0 on a numa based system.