PROBLEM: (88862) (PATCH ID: TCR520-031)
********
This patch will make AdvFS fileset quota enforcement work 
properly on a cluster.


PROBLEM: (89251, GB_G01729) (PATCH ID: TCR520-011)
********
This patch fixes a "deltokp->tok_hldfifo[TOK_RDWR].fifo_req_begin == NULL"
assertion failure following filesystem failover recovery which results in a
"cfsdb_assert" panic orginating from the cfsdb_assert() routine called by
the deletetoken() routine.
A message similar to the following may appear on the console or in the
message buffer just prior to the panic:

    "Assert Failed: deltokp->tok_hldfifo[TOK_RDWR].fifo_req_begin == NULL
        file: ../../../../src/kernel/tnc_common/tnc_cfe/clitok.c line: 1118
        caller: 0xfffffc0000953548"

This problem produces stack traces similar to the following:

    10 panic
    11 cfsdb_assert
    12 deletetoken
    13 send_to_server
    14 revoke_internal
    15 tok_revoke_range
    16 cfstok_revoke
    17 cfs_tokmsg
    18 rcfstok_revoke
    19 svr_rcfstok_revoke
    20 icssvr_daemon_from_pool


PROBLEM: (TKT220054, BCGM80DRR, SE_G01558, 89342) (PATCH ID: TCR520-005)
********
This patch corrects a problem which can cause cluster members to hang, 
waiting for the update daemon to flush /var/adm/pacct.


PROBLEM: (VNO88299B) (PATCH ID: TCR520-002)
********
This patch prevents a potential hang that can occur on a CFS failover.
The cfs_fo_thread's stack trace will look like:
 1:     lock_wait+228:  thread_block()
 2:     lock_read+1004: lock_wait(0x70, 0xfffffc00008eb330,
0xfffffc008839bc00, 0xfffffc0000838544)
 3:   cfs_reclaim+308:  lock_read(0xfffffc00fa5ed440)
 4:        vclean+388:  cfs_reclaim(0xfffffc00a78a9440, 0x7)
 5:         vgone+196:  vclean(0xfffffc00a78a9440, 0x7,
0xfffffc000092e578)
 6:  cfs_inactive+376:  vgone(0xfffffc00a78a9440, 0x3,
0xfffffc000092e578)
 7:         vrele+276:  cfs_inactive(0xfffffc00a78a9440)
 8:       freefid+220:  vrele(0xfffffc00a78a9440)
 9: cfs_remove_client_locks+932:        freefid(0xfffffc008b607da0)
10: cfs_rec_remove_server_state+80:
cfs_remove_client_locks(0xfffffc00f9de8300)
11: cfs_rec_start_server+616:
cfs_rec_remove_server_state(0xfffffc00f9de8300, 0x1)
12: cfs_fo_handle_bid_accept+292:
cfs_rec_start_server(0xfffffc00fa459dc0,
 0xfffffc00faa2a900, 0xfffffc0083bddf80, 0x4)
13: cfs_fo_thread+1204: cfs_fo_handle_bid_accept(0xfffffc00fa5ed400)


PROBLEM: (FR_G01276, 86827) (PATCH ID: TCR520-004)
********
This patch allows POSIX semaphores/msg queues to operate properly on
a CFS client.
These mechanisms are not "clusterized" and cannot be used
across nodes but any application using semaphores or message
that works on a base system should also work when run on a
single node in a cluster (client or server).


PROBLEM: (90288, HGO104051) (PATCH ID: TCR520-039)
********
This problem can manifest itself when reading files which contain
"holes".  Under certain conditions (outlined below), the CFS Cached
Direct Access Read code could access incorrect disk blocks while
servicing a read request.  Specifically, the problem can manifest
itself under the following conditions:

  - file is being read at a CFS client node.  
  - underlying physical filesystem type is AdvFS.
  - file is larger than 64k in size (ie, read will be handled via the
    CFS Cached Direct Access Read method).
  - file contains a "hole" at the end of the file

The net effect of the problem is that when the file is read at the CFS
server, the expected file contents are seen, but when read from a CFS
client, "random" data is returned.  It's also possible that the CFS
client node could panic with a panic message of "Assert Failed:
bp->b_dev".


PROBLEM: (89109, 89142) (PATCH ID: TCR520-014)
********
PROBLEM: (89142) (PATCH ID: )
This patch corrects a CFS problem which could be seen on a DMAPI/HSM 
managed filesystem whereby retries are exhausted for an internal 
DMAPI event which is not cleared for a region after event generation 
completes successfully, Once the vgoning of the effected file 
vnode occurs, there will be the following panic: 
"Assert Failed: ( t)->cntk_mode <= 2".


PROBLEM: (89109) (PATCH ID: )

This patch corrects a CFS problem which could cause a panic when 
an internal message is sent to the CFS server of a DMAPI/HSM 
managed filesystem from a CFS client node and the CFS server 
node dies while processing the message. The resulting panic 
string is: "Assert Failed: get_recursion_count(current_threa&
CFS_CMI_TO_REC_LOCK(mi)) == 1".


PROBLEM: (89797) (PATCH ID: TCR520-016)
********
PROBLEM: (89797)
This patch corrects a CFS timing window whereby a panic can
result if during a CFS relocation or unmount, multiple client 
nodes fail simultaneously.


PROBLEM: (89843) (PATCH ID: TCR520-018)
********
PROBLEM: (89843)
This patch corrects a CFS KMF panic which can occur on
executing the "cfsmgr -a DEVICES" command on a domain or
filesystem which contains LSM volumes whereby internally LSM
reports that the number of disks in an LSM volume is 0.


PROBLEM: (89728) (PATCH ID: TCR520-010)
********
This patch corrects a CFS problem that could cause a panic with the panic
string of "CFS_INFS full".


PROBLEM: (86949) (PATCH ID: TCR520-012)
********
This patch corrects a CFS problem which could cause a panic when a
file is being opened in Direct I/O mode, while at the same time, a
separate process is attempting to extend the file via a truncate()
syscall.


PROBLEM: (89942) (PATCH ID: TCR520-026)
********
Enabler support for Enterprise Volume Manager product


PROBLEM: (GB_G01710) (PATCH ID: TCR520-001)
********
This patch fixes memory leak in cfscall_ioctl().


PROBLEM: (90891) (PATCH ID: TCR520-068)
********
This patch is required for freezefs support.


PROBLEM: (91142) (PATCH ID: TCR520-100)
********
This fix addresses a data inconsistency that can occur when a CFS
client reads a file that was recently written to and whose underlying
AdvFS extent map contains more than 100 extents.  To find out how 
many extents a file has, use the showfile -x command.


PROBLEM: (87918) (PATCH ID: TCR520-090)
********
If the mount of a clusterized file system type is attempted onto a
non-clusterized file system type, a panic
TRAP: INVALID MEMORY READ ACCESS FROM KERNEL MODE
occurs.


PROBLEM: (87406) (PATCH ID: TCR520-091)
********
This patch prevents panics during unmount processing and during planned
relocation.  In the former case a representative stack trace would indicate
a panic during the cfs_unmount() routine, and in the latter case a panic during
the do_pfs_unmount() routine.


PROBLEM: (90070) (PATCH ID: TCR520-104)
********
This patch corrects support for muliptle filesets being mounted from
the cluster_root domain. When the server node leaves the
cluster while other filesets from cluster_root are mounted, all
other cluster nodes could panic with the following:
	panic : "cfs_do_pfs_mount: pfs and cfs fsids differ on failover"


PROBLEM: (BCGMB2CCS) (PATCH ID: TCR520-080)
********
This patch fixes the assertion failure ERROR != ECFS_TRYAGAIN.
The stack trace looks like:
0 stop_secondary_cpu: 1205
1 panic: 1252
2 event_timeout: 1971
3 printf: 940
4 panic: 1309
5 cfsdb_assert: 452
6 cfs_create: 5186
7 vn_open: 707
8 copen: 3300
9 syscall: 727
10 _Xsyscall: 1785


PROBLEM: (86882) (PATCH ID: TCR520-083)
********
There is a race between cluster mount and name space lookup logic which may
result in a transient ENODEV error returned to the lookup.  This problem was
first noticed in the context of auto-mounting a home directory during remote
login, under AutoFS.  Infrequently, the initial attempt to login would fail but
subsequent attempts would succeed.  This problem may occur, however, with
arbitrary applications and depends only upon timing considerations.


PROBLEM: (89503, (for) (PATCH ID: TCR520-089)
********
When a node is booting and a mount request is executed on another node whereby
the booting node is selected as the CFS server of the fs, a booting node could
panic if is not ready for the mount request.


PROBLEM: (88766) (PATCH ID: TCR520-095)
********
This patch fixes a panic of a node already in the cluster, when a node
re-joins the cluster.  This problem is most likely to occur when quorum has been
lost, and has just been regained due the the joining node.  The panic string
will be :  PANIC: CFS_ADD_MOUNT() - DATABASE ENTRY PRESENT


PROBLEM: (80986) (PATCH ID: TCR520-099)
********
One race condition involves a transient failure to reserve a kernel resource
associated with the file system to be mounted, and results in an ENODEV errno
returned to the mount system call.  The second race condition involves the
use of a stale memory pointer within the kernel, and will likely result in a
panic during the cms_mount_initial() routine.


PROBLEM: (84254) (PATCH ID: TCR520-078)
********
This patch fixes a cluster problem with hung unmounts (possibly
seen as hung node shutdowns).
Messages similar to the following will appear on the console of
the node serving the filesystem being unmounted:

    WARNING: svrcfstok_waitfortokens: svrcfstok structures not
    cleaned up (retries = 25)

    WARNING: svrcfstok_waitfortokens: svrcfstok structures not
    cleaned up (retries = 25)


PROBLEM: (90221, 91235) (PATCH ID: TCR520-101)
********
This patch addresses a problem where, under certain very rare
conditions, a panic with a stack trace similar to the following could
result:

   PANIC: "pgl_remove: remove from empty (vop)->vu_cleanpl"         
   4 panic                src/kernel/bsd/subr_prf.c
   5 ubc_page_release     src/kernel/vfs/vfs_ubc.c
   6 cfs_putpage          src/kernel/tnc_common/tnc_cfe/alpha/cfs_vm_alpha.c
   7 ubc_invalidate       src/kernel/vfs/vfs_ubc.c
   8 vclean               src/kernel/vfs/vfs_subr.c
   9 vgone                src/kernel/vfs/vfs_subr.c


PROBLEM: (90792, 89527) (PATCH ID: TCR520-081)
********
PROBLEM: (90792) (PATCH ID: )
There is a small window where if a mount update races with an unmount and
remount of the same mount point, it is possible for one node to experience
and Kernel Memory Fault panic in the cfs_mount_update_accept() function.

PROBLEM: (89527) (PATCH ID: )
The race condition fixed by this patch eliminates a kernel memory fault panic
during the cms_shutdown() routine.


PROBLEM: (87952, 87231) (PATCH ID: TCR520-082)
********
PROBLEM: (87952)
A mount update request to a Memory File System (MFS) will cause a Kernel
Memory Fault panic.

PROBLEM: (87231)

A bad argument to the mount syscall could cause a panic and there are some
error cases for mounts which will leave the resulting failed mount point 
busy.


PROBLEM: (DE_G02611, 90821, LU_G02822) (PATCH ID: TCR520-070)
********
This patch is to prevent a panic:
Assert failed: vp->v_numoutput > 0
or a system hang when a filesystem becomes full and direct async I/O via
CFS is used.  A vnode will exist that has v_numoutput
with a greater than 0 value and the thread is hung in vflushbuf_aged().


PROBLEM: (91051) (PATCH ID: TCR520-092)
********
This patch fixes a possible Kernel Memory Fault panic in the
function ckidtokgs() with the following stack trace:
   Thread 0xfffffc00233aa380: Pid 632125: icssvr_daemon_fr
    0 stop_secondary_cpu() [1202, 0xfffffc00005f5a3c]
    1 panic() [1252, 0xfffffc0000294a04]
    2 event_timeout() [1971, 0xfffffc00005f6c74]
    3 printf() [940, 0xfffffc0000293db8]
    4 panic() [1309, 0xfffffc0000294b38]
    5 trap() [2262, 0xfffffc00005ea680]
    6 _XentMM() [2116, 0xfffffc00005e4458]
    7 ckidtokgs() [8849, 0xfffffc0000946618]
    8 cfs_sentinel_force() [1424, 0xfffffc00008ff064]
    9 crfs_fsync_0() [7302, 0xfffffc00008fd068]
   10 icstnc_rpc_dispatch() [951, 0xfffffc0000802ef0]
   11 icstnc_svr_rcall() [714, 0xfffffc00008029d0]
   12 icssvr_daemon_from_pool() [778, 0xfffffc00008be7bc]


PROBLEM: (90178, BCGM918KQ) (PATCH ID: TCR520-059)
********
Fix potential CFS deadlock.


PROBLEM: (IT_G02601, IT_G02586, 90532) (PATCH ID: TCR520-062)
********
This patch is to correct a cfsmgr error "Not enough space"
when attempting to relocate a file system with a large amount of disks.
An example of the cfsmgr command and the error the patch corrects is:

#cfsmgr -r -a server=hostname -d AreaBuff_dmn
   cfsmgr: subsystem error: Not Enough Space


PROBLEM: (91311) (PATCH ID: TCR520-093)
********
This patch corrects possible CFS file read failures if the storage used for an 
AdvFS domain is LSM volumes which are comprised of local only storage and the 
node attached to the storage leaves the cluster. The other remaining nodes
may get file read failures once the attached node reboots into the cluster and
reserves the AdvFS domain.  Also, there is a very small window for a non LSM 
AdvFS domain whereby the same problem could occur.


PROBLEM: (90039S) (PATCH ID: TCR520-084)
********
This patch corrects support for muliptle filesets being mounted from
a cluster node's boot partition domain. When the server node leaves the
cluster while other filesets from its boot parition domain are mounted,
other nodes could panic with the following:   
	Assert failed: CFS_CMI_TO_SERVER (vftocmi(mp))==this_node


PROBLEM: (92941) (PATCH ID: TCR520-136)
********
This patch addresses a cluster problem that can arise in the case
where a cluster is serving as an NFS server.  The problem can result
in "stale" file data being cached at cluster nodes which are servicing
NFS requests (ie, the cached data will not be invalidated if the file
is subsequently written to).  The problem manifests itself on a
per-file basis, and the net result is that reading a file from
different cluster nodes could yield different results.


PROBLEM: (92135) (PATCH ID: TCR520-116)
********
This patch corrects a CFS problem which could be seen on a DMAPI/HSM
managed filesystem whereby the node executing the HSM and serving the fs
panics with the following:
	(panic): cfstok_hold_tok(): held token table overflow


PROBLEM: (90512) (PATCH ID: TCR520-053)
********
The panic "cmn_err: CE_PANIC: ics_unable_to_make_progress: netisrs stalled"
would happen with clua.mod attempting to malloc with wait. Since memory was
exhausted, the wait would cause a timeout and panic.


PROBLEM: (90886) (PATCH ID: TCR520-067)
********
A kernel memory fault panic would occur in clua_cnx_unregister if a protocol
specific pcb (tp) could not be allocated for a new TCP connection.


PROBLEM: (90232) (PATCH ID: TCR520-044)
********
When a new member is added to a cluster alias, the selection priority of that
member would not be recognized, resulting in connections potentially going to
the wrong cluster member.


PROBLEM: (DEK043348, 90164) (PATCH ID: TCR520-077)
********
This patch fixes a problem when the cluster alias subsystem does not send a 
reply to a client that pings a cluster alias address with a packet size of 
less than 28 bytes.


PROBLEM: (EVT07519664) (PATCH ID: TCR520-003)
********
This patch allows the command "cfsstat -i" to execute properly.  
Before the patch you would receive the error:
get_val: read: No such device or address
This patch also corrects a memory leak in the command.


PROBLEM: (93384) (PATCH ID: TCR520-159)
********
This patch address a potential Cluster File System deadlock which can
occur during CFS failover processing following the failure of a CFS
server.  Under certain conditions, it's possible for the CFS failover
processing to deadlock with outstanding I/O's, but in turn, the I/O's
are blocked due to the server failure, thus the failover never occurs,
and any processes attempting to access the file system involved block
indefinitely.


PROBLEM: (91910, BCGM10TH0) (PATCH ID: TCR520-124)
********
Prevent process hangs on clusters mounting NFS file systems and
accessing plock-ed files on the NFS server.
Most obvious symptom is "ps" commands blocked for long periods of
time with the following stack trace:

lock_wait()
lock_write()
u_map_copyout()
table()
syscall()
_Xsyscall()


PROBLEM: (92511, 92734) (PATCH ID: TCR520-126)
********
This patch fixes a possible timing window whereby a booting node may panic 
due to memory corruption if another node dies which is the server of NFS 
filesystems or server only filesystems while the booting node is performing 
remote mounting which occurs once the following line is output to the 
console during boot: CMS: Joining deferred filesystem sets" before going to 
multi user
mode.


PROBLEM: (91620) (PATCH ID: TCR520-109)
********
This patch fixes a clusterwide panic with: RM_CRASH_NODE_MASK PANIC IN RM_POLL
which can occur when a node leaves the cluster causing quorum loss and then 
rejoins the cluster when it was previosuly the server of fs and a filesystem
request gets sent to the rebooted node before it is able to handle it and 
reject it appropriately.


PROBLEM: (TKT291292) (PATCH ID: TCR520-119)
********
This patch fixes a problem in which a cluster member may panic with the
panic string "kernel memory fault".


PROBLEM: (93007) (PATCH ID: TCR520-147)
********
If the cluster_root domain consists of a LSM volume and the underlying 
physiscal storage is not connected to a booting node, the booting node
may hang and display the following message to the console:
	Waiting for cluster mount to complete
	<5>lsm:volio: Cannot open disk dsk88: kernel error 6


PROBLEM: (92325, BCGM303KH) (PATCH ID: TCR520-118)
********
This patch prevents a memory leak from occurring when using
small, unaligned Directio I/O access (ie, not aligned on a 512
boundary, and doesn't cross a 512 byte boundary).
Analysis of a forced crash or dumpsys will show a large amount of memory
consumed in the malloc bucket:  CFS GENERAL

Example:

   124   CFS GENERAL  38351112032


PROBLEM: (87683) (PATCH ID: TCR520-110)
********
When "cfsmgr -a statistics" is invoked and the file system named is not
mounted, it is possible that erroneous information will be displayed for the
server name.


PROBLEM: (91582) (PATCH ID: TCR520-111)
********
This patch corrects support for Synchronized IO in CFS.  Files opened from
remote clients using file status flags O_DSYNC or O_RSYNC, were not conforming
to the behaviors as documented in the open(2) manpage.


PROBLEM: (89966) (PATCH ID: TCR520-120)
********
This patch will eliminate erroneous EIO errors which could occur
if a client node becomes a server during a rename/unlink/rmdir 
system call between the initial lookup done by the vfs layer and the
subsequent call to the pfs operation.


PROBLEM: (92816) (PATCH ID: TCR520-130)
********
This patch addresses a CFS problem that could result in degraded
performance when sequentially reading, from a CFS client, at file
offsets past 2GB.


PROBLEM: (90686, 91982) (PATCH ID: TCR520-112)
********
This patch addresses a file locking problem which can arise when using
a cluster as an NFS file server.  In the case of a failure of one of
the cluster members, specifically the CFS server for any exported NFS
filesystems, the NFS file locking service in the cluster may not be
re-started quickly enough.  The result could be that processes running
within the cluster may be granted file locks that conflict with locks
that were granted to NFS clients, but weren't reclaimed during the NFS
lock daemon's "grace period".


PROBLEM: (91622, 91977) (PATCH ID: TCR520-113)
********
This patch addresses a CFS problem where file access rights may not
appear consistent cluster-wide.  The effect could be that a for a
given file, a particular user may be erroneously denied file access
from certain cluster nodes, while being granted the expected access
from others.  This problem is far more likely to occur in the case of
NFS filesystems (ie, cluster as NFS client), than for local filesystem
types.


PROBLEM: (86883) (PATCH ID: TCR520-107)
********
This patch fixes two problems.  First, a race between file name lookup and 
cluster mount may result in the lookup erroneously failing.  This is more likely
in the presence of AutoFS.  The second problem is a file system recovery during
failover that deadlocks.  This occurs only with an Advfs fileset mounted
beneath the subdirectory of /etc/fdmns that corresponds to its domain.


PROBLEM: (88878, 92739) (PATCH ID: TCR520-131)
********
This patch corrects a Cluster File System (CFS) performance
issue seen when multiple threads/processes simultaneously
access the same file on an SMP (>1 cpu) system.
The specific performance issue addressed by this patch is one
in which a dramatic drop in filesystem performance is seen
when more than 1 or 2 threads/processes on the same cluster
node are simultaneously accessing the same file and the node
has more than 1 cpu.


PROBLEM: (90221, 91235) (PATCH ID: TCR520-151)
********
This patch addresses an obscure CFS problem that could result in a
cluster-wide hang.  When recycling dirty pages, a cross-node deadlock
could occur between CFS and Advfs, the result of which would typically
render the cluster completely unresponsive.


PROBLEM: (91428) (PATCH ID: TCR520-108)
********
When ACLs are enabled on the system and there is a default ACL on a
directory, files and directories created in that directory should 
inherit the default ACL and permissions based on the rules that are
discussed in detail in the Security manual.  In particular, the
file permissions should be based on the intersection of requested 
mode (unmodified by the umask) and the permissions from the default 
ACL.  Files and directories created from CFS clients are given the 
permissions directly from the default ACL without the required
intersection with the requested mode.
This patch insures that files created from a CFS client node 
will be given the same permissions that they would get if the 
create request were issued at the CFS server or on a non-cluster 
system.


PROBLEM: (DE_G03037) (PATCH ID: TCR520-144)
********
This patch fixes a problem where cluster filesystem (CFS) I/O and AdvFS domain
access causes processes to hang.
Node1                           Node2
-----                           -----
cfsmgr -r -a SERVER=Node2 /fs
cd /fs/Dir
dd if=/dev/zero of=/bigfile
wait a little bit...
ls -l &
                                cd /fs/Dir
                                ls -l &
                                before ls completes, halt node

ps & - hangs
ls -l & - hangs
df -k & - hangs


PROBLEM: (93354) (PATCH ID: TCR520-157)
********
This patch fixes a hang during node shutdown that occurs when some other
node in the cluster serves a server_only file system.


PROBLEM: (HPAQA2CB0, DE_G03705) (PATCH ID: TCR520-148)
********
This patch fixes a kernel memory fault from clua_cnx_thread.
 0 stop_secondary_cpu : 1205
 1 panic : 1252
 2 event_timeout : 1971
 3 printf : 940
 4 panic : 1309
 5 trap : 2262
 6 _XentMM : 2115
 7 malloc_internal : 1720
 8 kch_join_attach_internal : 381
 9 kch_join_set : 335
10 clua_join_set : 1666
11 clua_aliasset_common : 508
12 cluaioc_alias : 279
13 clua_cfgmgr_dispatch : 592
14 clua_configure : 496
15 kmodcall : 696
16 syscall : 713
17 _Xsyscall : 1785


PROBLEM: (92701, STL401547) (PATCH ID: TCR520-129)
********
This patch addresses a cluster problem where an application which uses
file locking may experience degraded performance.

In the referenced CLD, the customer experienced severely degraded
performance of their Cobol application.  The application didn't
explicitly do file locking, but it's likely that something in the
Cobol run-time environment did.


PROBLEM: (89245, 90061) (PATCH ID: TCR520-033)
********
There is a firmware issue in the HSG80 controllers that during
a cluster transistion can cause the HSG80 controllers to crash. 
This controller crash can then cause loss of data access to those
logical volumes on that pair of HSG80 controllers.  If cluster
root is on that HSG80 a cluster domain panic can result.

The symptoms of this problem are DRD barrier errors logged to
the /usr/adm/messages files and to the console.

This can also be verified by examining the HSG80 fmu logs
and the HSGO console.  The key text in determining this problem 
is as follows:
  During processing to maintain consistency of the data for Persistent
  Reserve SCSI commands, an internal inconsistency was detected.
   > Last Failure Parameter[0] contains a code defining the precise
     nature

Example output
run fmu

FMU> show last most

Last Failure Entry: 6. Flags: 006FF901
 Template: 1.(01) Description: Last Failure Event
 Occurred on 26-SEP-2001 at 10:10:29
 Power On Time: 1. Years, 30. Days, 9. Hours, 1. Minutes, 11. Seconds
 Controller Model: HSG80
 Serial Number: ZG02900845 Hardware Version:  E05(2D)
 Software Version: XCF4P-0(FF)
 Instance Code: 0102030A Description:
  An unrecoverable software inconsistency was detected or an intentional

  restart or shutdown of controller operation was requested.
 Reporting Component: 1.(01) Description:
  Executive Services
 Reporting component's event number: 2.(02)
 Event Threshold: 10.(0A) Classification:
  SOFT. An unexpected condition detected by a controller software
component
  (e.g., protocol violations, host buffer access errors, internal
  inconsistencies, uninterpreted device errors, etc.) or an intentional
  restart or shutdown of controller operation is indicated.
 Last Failure Code: 43230101
  Last Failure Parameter[0.] 00000013
 Last Failure Code: 43230101 Description:
  During processing to maintain consistency of the data for Persistent
  Reserve SCSI commands, an internal inconsistency was detected.
   > Last Failure Parameter[0] contains a code defining the precise
nature
     of the inconsistency.
 Reporting Component: 67.(43) Description:
  Host Port Protocol Layer
 Reporting component's event number: 35.(23)
 Restart Type: 0.(00) Description: Full software restart
 Active Thread: FOC I960 Priority: 0.(00)
 Interrupt Stack Guard is intact
 NULL Thread Stack Guard is intact
 Thread Stack Guard State Flags (ID# Bit; 0=intact,1=not intact):
00000000


PROBLEM: (89054) (PATCH ID: TCR520-017)
********
This patch fixes a situation in which a rebooting cluster member
would panic shortly after rejoining the cluster if another cluster
mamber was doing remote disk IO to the rebooting member when it
was rebooted.


PROBLEM: (GB_G01153) (PATCH ID: TCR520-006)
********
This patch allows high density tape drives to use the high density compression
setting in a cluster environment.
While opening a tape density file tape_d*, the DRD driver would issue a special 
ioctl by using dev_t of tape0 (the default density file). This caused all of the
device drivers to register tape_d* standand density, therefore consequently, no
matter what density is specified, they all ended up with the standard mode. The
fix in DRD driver picks the correct dev_t.


PROBLEM: (HPAQ50WCZ) (PATCH ID: TCR520-007)
********
During the cluster failover time, if the cluster has any shared served
disks such as a shared CD-ROM, the cluster members that directly connect
to the devive can crash with a message similiar to:
trap: invalid memory ifetch access from kernel mode

    faulting virtual address:     0x0000000000000000
    pc of faulting instruction:   0x0000000000000000
    ra contents at time of fault: 0xffffffff0052afd0
    sp contents at time of fault: 0xfffffe068993f820

panic (cpu 1): kernel memory fault


PROBLEM: (89405) (PATCH ID: TCR520-020)
********
This patch fixes the problem of cluster wide hang because of DRD node
failover is stuck and unable to bid a new server for served device.


PROBLEM: (90685, 90503) (PATCH ID: TCR520-064)
********
There is a firmware issue in the HSG80 controllers that during
a cluster transistion can cause the HSG80 controllers to crash. 
This controller crash can then cause loss of data access to those
logical volumes on that pair of HSG80 controllers.  If cluster
root is on that HSG80 a cluster domain panic can result.
The symptoms of this problem are DRD barrier errors logged to
the /usr/adm/messages files and to the console.

This can also be verified by examining the HSG80 fmu logs
and the HSGO console.  The key text in determining this problem 
is as follows:
  During processing to maintain consistency of the data for Persistent
  Reserve SCSI commands, an internal inconsistency was detected.
   > Last Failure Parameter[0] contains a code defining the precise
     nature

Example output
run fmu

FMU> show last most

Last Failure Entry: 6. Flags: 006FF901
 Template: 1.(01) Description: Last Failure Event
 Occurred on 26-SEP-2001 at 10:10:29
 Power On Time: 1. Years, 30. Days, 9. Hours, 1. Minutes, 11. Seconds
Controller Model: HSG80
 Serial Number: ZG02900845 Hardware Version:  E05(2D)
 Software Version: XCF4P-0(FF)
 Instance Code: 0102030A Description:
  An unrecoverable software inconsistency was detected or an intentional

  restart or shutdown of controller operation was requested.
 Reporting Component: 1.(01) Description:
  Executive Services
 Reporting component's event number: 2.(02)
 Event Threshold: 10.(0A) Classification:
  SOFT. An unexpected condition detected by a controller software
component
  (e.g., protocol violations, host buffer access errors, internal
  inconsistencies, uninterpreted device errors, etc.) or an intentional
  restart or shutdown of controller operation is indicated.
 Last Failure Code: 43230101
  Last Failure Parameter[0.] 00000013
 Last Failure Code: 43230101 Description:
  During processing to maintain consistency of the data for Persistent
  Reserve SCSI commands, an internal inconsistency was detected.
   > Last Failure Parameter[0] contains a code defining the precise
nature
     of the inconsistency.
 Reporting Component: 67.(43) Description:
  Host Port Protocol Layer
 Reporting component's event number: 35.(23)
 Restart Type: 0.(00) Description: Full software restart
 Active Thread: FOC I960 Priority: 0.(00)
 Interrupt Stack Guard is intact
 NULL Thread Stack Guard is intact
 Thread Stack Guard State Flags (ID# Bit; 0=intact,1=not intact):
00000000


PROBLEM: (90961) (PATCH ID: TCR520-075)
********
Resources like tape/changers handled by CAA do not come online (according to
 CAA). The caa_stat command will return something like this even though there
is no problem accessing the device:
NAME=tapeone
 TYPE=tape
 TARGET=ONLINE on hamm
 TARGET=ONLINE on woody
 STATE=OFFLINE on hamm
 STATE=OFFLINE on woody


PROBLEM: (90599) (PATCH ID: TCR520-079)
********
This patch fixes a problem where the tape changer is only accessible
from member that's the drd server for the changer.


PROBLEM: (91283) (PATCH ID: TCR520-094)
********
This patch fixes a problem where an open request to a disk in a cluster
fails with an illegal errno (>=1024).


PROBLEM: (87387) (PATCH ID: TCR520-096)
********
This patch fixes a problem where an open to a tape drive in a cluster
would take 6 min (instead of 2) to fail if there were no tape in the drive.


PROBLEM: (91286) (PATCH ID: TCR520-097)
********
This patch solves a problem in which a cluster would hang the next time a node
was rebooted after a tape device was deleted from the cluster.


PROBLEM: (90755) (PATCH ID: TCR520-088)
********
This patch fixes a domain panic in a cluster when a file system is mounted
on a disk accessed remotely over the cluster interconnect.


PROBLEM: (BCSMA0T58/90257) (PATCH ID: TCR520-098)
********
This patch fixes the race condition problem when multiple unbarrierable
disks failed at the same time.


PROBLEM: (TPOQ57031) (PATCH ID: TCR520-103)
********
This patch fixes a kernel memory fault in drd_open


PROBLEM: (93056) (PATCH ID: TCR520-155)
********
This is a regression in tcr51asupportos.bl2 and wcalphaos and as such
can't be seen by a customer since they have never gotten the code.


PROBLEM: (92113, 92114, 92235, 92689) (PATCH ID: TCR520-149)
********
This patch addresses the following issues:
Performance fix associated with locking hierarchy.
Added support for the new cluster safe IOCTLs added in the base submit.
Cleans up compiler warnings.
Fixes problems associated with kch collistions.
DRD tape/changer access path locking issues.
Drdmgr message errors.
DRD KCH proposal to reject the server tranfer issues.
Better DRD console messages.
DRD CNX recovery issues.
CNX drain issues.
CNX device recovery issues.
Reservation conflicts and MUNSA_REJECTS.
Elimnates a race condition in the open/reopen code.


PROBLEM: (93630) (PATCH ID: TCR520-162)
********
This patch will ensure that this kit works properly during a rolling upgrade
process. The problem being resolved is within this kit, and is not present
in previous OS versions or patch kits.

PROBLEM: (93022) (PATCH ID: TCR520-167)
********
This fix addresses a problem in which a cluster or a device can get IO:s
stuck or that a cluster node may panic after a device has been deleted.


PROBLEM: (93126, 93724) (PATCH ID: TCR520-166)
********
Excessive FIDS lock contention is observed when large number of files using 
system based file locking. Result from "lockinfo -sort=misses <application> 
-d 20 -f 200 -p 25 -l 20" will shows at the top of the list with a high miss 
rate.


PROBLEM: (94199) (PATCH ID: TCR520-197)
********
This patch addresses a problem when a file is removed on a node that is not the
CFS server for the filesystem.  The attributes for the directory were not
updated on the CFS server, and hence the attributes returned by the NFS server
would not be updated.
This behavior can cause NFS clients to erroneously continue to apply cached
lookup data since the directory had not changed in their view, leading to
stale file handle errors, when a similar situation on a single-system server
would not.


PROBLEM: (94106) (PATCH ID: TCR520-198)
********
This patch fixes a hang condition in Device Request Dispatcher (DRD) 
when accessing a failed disk.


PROBLEM: (94082) (PATCH ID: TCR520-184)
********
This patch fixes a problem in the cluster kernel where a cluster member
panics when another member is rebooted.


PROBLEM: (BCGMM1774, GB_G04904, 94740) (PATCH ID: TCR520-203)
********
This patch prevents a simple_lock: time limit exceeded panic or a
Assert Failed: brp->br_fs_svr_out panic that can be seen while executing
chfsets on a cluster.
Example stack traces are:

     0 boot                 
     1 panic               
     2 simple_lock_fault    
     3 simple_lock_time_violation
     4 cfs_blkrsrv_flush    
     5 msfs_syscall_op_set_bfset_params_activate
     6 msfs_real_syscall  
     7 msfs_syscall       
     8 syscall              
     9 _Xsyscall           

crash> tf
      0 stop_secondary_cpu
      1 panic           
      2 event_timeout    
      3 printf            
      4 panic      
      5 cfsdb_assert
      6 cfs_blkrsrv_svrupdate
      7 cfs_pfscachewrite
      8 cfs_write   
      9 vn_write       
     10 rwuio           
     11 write            
     12 syscall           
     13 _Xsyscall


PROBLEM: (94157, 94221, 94440) (PATCH ID: TCR520-188)
********
This patch fixes a problem in the cluster kernel where a cluster member 
hangs during cluster shutdown or while booting.


PROBLEM: (94279) (PATCH ID: TCR520-179)
********
This patch fixes a problem in the cluster kernel where a cluster member
panics when a tape device is accessed.


PROBLEM: (93203, TKT299628, BCGM40HZM) (PATCH ID: TCR520-163)
********
This patch fixes a token problem which could cause an unmount to hang.
Also what will be seen on the console during this problem are messages
similiar to:

WARNING: svrcfstok_waitfortokens: svrcfstok structures not cleaned up
(retries = 1100)


PROBLEM: (92409) (PATCH ID: TCR520-212)
********
This patch fixes a CNX manager panic encountered while multiple cluster nodes 
are booted simulataneously.  The panic string seen is:
	CNX MGR: Invalid configuration for cluster seq disk


PROBLEM: (94120) (PATCH ID: TCR520-174)
********
This is a regression in tcr51asupportos.bl2 and wcalphaos and as such
can't be seen by a customer since they have never gotten the code.


PROBLEM: (93635) (PATCH ID: TCR520-164)
********
This fix addresses a problem in which two nodes leaving the cluster within a
short (but not too short) time period would cause IO:s on some devices to get
stuck.


PROBLEM: (93870) (PATCH ID: TCR520-165)
********
This fix addresses a problem in which a new device would not be properly
configured in a cluster if the device was discovered during a boot.
On some of the booting nodes the device would not be considered locally
connected although it is. This can create Availability problems later.


PROBLEM: (93996) (PATCH ID: TCR520-185)
********
The Device Request Dispatcher, DRD, should retry to get disk attributes when 
EINPROGRESS is returned from the disk driver.  This problem can be seen by
deleting a device in a cluster and then adding it.  The console message is: 
drd_get_disk_attributes (1234) - ksm_get_attributes failed 36


PROBLEM: (DE_G04593) (PATCH ID: TCR520-210)
********
This addresses an issue with ICS overloading rad 0 on a numa based system.


PROBLEM: (94911, 95063) (PATCH ID: TCR520-215)
********
Fixes a possible race condition between a SCSI reservation conflict and an I/O
drain, which could result in a hang.
The race condition occurs when a SCSI event causes a reservation conflict, such 
as a path failover, while at the same time a cluster member is in the process of
issuing an I/O barrier, due to an event such as a member transition.

This results in a hang on the cluster member attempting to barrier.  Examination
of the system in this state or by a forced crash will reveal one or more 
drd_event_threads sleeping in ccmn_send_ccb_wait3().  The hang is ultimately 
caused by in flight I/Os that are pending due to the above thread.

Here is a typical stack trace:

	THREAD: fffffc0003816e00
	 0 thread_block
	 1 sleep_prim
	 2 mpsleep
	 3 ccmn_send_ccb_wait3
	 4 ccmn_path_ping3
	 5 ccmn_resolve_paths3
	 6 cdisk_ioctl
	 7 drd_issue_local_ioctl
	 8 drd_check_path
	 9 drd_handle_event_io_drained
	10 drd_handle_one_event
	11 drd_handle_events
	12 drd_event_thread


PROBLEM: (94385) (PATCH ID: TCR520-186)
********
This fix adds the support of multiple opens to tape libraries/media changers.
Prior to this fix, the Device Request Dispatcher would fail for multiple opens 
on tape libraries/media changers returning EBUSY (errno 16).


PROBLEM: (92799) (PATCH ID: TCR520-216)
********
This patch alleviates a condition in which a cluster member takes an extremely 
long time to boot when using LSM.
The problem occurs when a fiber channel disk that belongs to an LSM set goes
bad.  The condition is seen while booting a system into a cluster, where the 
other members are far enough up to recognize their LSM sets.  

Immediately after the "starting LSM" boot message, the booting system will 
appear to hang and will periodically output the following message to the user 
console:

	"DRD failed register against <DEVICE ID> returned 5"


PROBLEM: (90608) (PATCH ID: TCR520-191)
********
This patch corrects a reference counting problem on objects related to 
mountpoints.  The problem had resulted in the unexpected persistance of an
object seen during cluster mount, and the subsequent failure of an assertion
within the code.  The problem can be recognized by the panic string 
" CFS_JOIN_COMMIT: NO DB ENTRY OR INFO STRUCT ".


PROBLEM: (92789) (PATCH ID: TCR520-173)
********
This patch relieves pressure on the CMS global DLM lock by allowing AutoFS
auto-mounts to back off when their lock requests are not granted within a
reasonable amount of time.   This can help avoid turning a transient slowdown
into one which is more persistent.


PROBLEM: (93923) (PATCH ID: TCR520-171)
********
This patch addresses a potential panic in the Cluster File System
which can occur when using raw Asynchronous I/O.  

When the problem occurs, the symptom will be a locking violation panic
with the following string:
 
   "mcs_unlock: current lock not found"

and a stack trace ending in either cfs_condio_iodone() or
cfs_condio_issue_io(), such as:


  4 panic                src/kernel/bsd/subr_prf.c : 1309
  5 simple_lock_fault    src/kernel/kern/lock.c : 2805
  6 mcs_unlock_found_violation  src/kernel/kern/lock.c : 3142
  7 cfs_condio_iodone    src/kernel/tnc_common/tnc_cfe/cfs_directio.c : 870
  8 biodone              src/kernel/vfs/vfs_bio.c : 1682
  9 volkiodone           src/kernel/lsm/dec/kiosubr.c : 235
  10 volsiodone           src/kernel/lsm/common/siosubr.c : 358
  11 vol_mv_write_done    src/kernel/lsm/common/mvio.c : 3596
  12 voliod_iohandle      src/kernel/lsm/common/iod.c : 569
  13 voliod_loop          src/kernel/lsm/common/iod.c : 372


PROBLEM: (94580) (PATCH ID: TCR520-190)
********
This problem addresses an assertion failure which can occur in the Cluster
File System when file system quotas are in use.  The problem can only happen
if a user has opened a very large number of files (at least 32768) since the
cluster was booted.
The assertion failure is: 
Assert Failed: dq->cfs_dq_cnt > 0

The stack traceback will be similar to the following:

     1 panic                src/kernel/bsd/subr_prf.c
     2 cfsdb_assert         src/kernel/tnc_common/tnc_cfe/alpha/cfs_debug.c
     3 cfs_dqget            src/kernel/tnc_common/tnc_cfe/alpha/cfs_quota.c
     4 cfs_getinoquota      src/kernel/tnc_common/tnc_cfe/alpha/cfs_quota.c
     5 cfs_rwvp_cache       src/kernel/tnc_common/tnc_cfe/alpha/cfs_vm_alpha.c
     6 cfs_cachewrite       src/kernel/tnc_common/tnc_cfe/alpha/cfs_vm_alpha.c
     7 cfs_write            src/kernel/tnc_common/tnc_cfe/cfs_vm_osi.c
     8 vn_write             src/kernel/vfs/vfs_vnops.c
     9 rwuio                src/kernel/bsd/sys_generic.c
    10 write                src/kernel/bsd/sys_generic.c
    11 syscall              src/kernel/arch/alpha/syscall_trap.c
    12 _Xsyscall            src/kernel/arch/alpha/locore.s


PROBLEM: (94645, 94795) (PATCH ID: TCR520-209)
********
This patch fixes kernel memory faults that can happen if invalid arguments
are supplied to the mount system call on a cluster. A typical stack traceback
might be either of the following:
Example 1
---------
   0 panic                 src/kernel/bsd/subr_prf.c
   1 trap                  src/kernel/arch/alpha/trap.c
   2 _XentMM               src/kernel/arch/alpha/locore.s
   3 strlen                src/kernel/arch/alpha/fastcopy.s
   4 cms_mfs_mount_args    src/kernel/tnc_common/tnc_cfe/cms_utils.c
   5 cms_copy_mount_args   src/kernel/tnc_common/tnc_cfe/cms_utils.c
   6 cms_mount_preprocess  src/kernel/tnc_common/tnc_cfe/cms_kgs.c
   7 cluster_mount         src/kernel/tnc_common/tnc_cfe/cfs_mnthooks.c
   8 mount1                src/kernel/vfs/vfs_syscalls.c
   9 syscall               src/kernel/arch/alpha/syscall_trap.c
  10 _Xsyscall             src/kernel/arch/alpha/locore.s

Example 2
---------
   0 panic  		   src/kernel/bsd/subr_prf.c
   1 trap		   src/kernel/arch/alpha/trap.c
   2 _XentMM		   src/kernel/arch/alpha/locore.s
   3 copystr		   src/kernel/arch/alpha/copy.c
   4 namei		   src/kernel/vfs/vfs_lookup.c
   5 cms_getmdev	   src/kernel/tnc_common/tnc_cfe/cms_utils.c
   6 cms_ufs_device_list   src/kernel/tnc_common/tnc_cfe/cms_utils.c
   7 cms_select_cfs_server src/kernel/tnc_common/tnc_cfe/cms_kgs.c
   8 cms_ufs_mount_initial src/kernel/tnc_common/tnc_cfe/cms_utils.c
   9 cms_mount_preprocess  src/kernel/tnc_common/tnc_cfe/cms_kgs.c
  10 cluster_mount	   src/kernel/tnc_common/tnc_cfe/cfs_mnthooks.c
  11 mount1		   src/kernel/vfs/vfs_syscalls.c
  12 syscall		   src/kernel/arch/alpha/syscall_trap.c
  13 _Xsyscall		   src/kernel/arch/alpha/locore.s


PROBLEM: (94643) (PATCH ID: TCR520-194)
********
When multiple nodes are shutting down together, and there are server-only
filesystems mounted, it is possible that some nodes will enter retry logic
which will never end.  This will occur far enough into the system
shutdown processing so that the node will generally be unusable, but 
before the "syncing disks..." is printed to the console.


PROBLEM: (89024/94321) (PATCH ID: TCR520-183)
********
Fix for potential system crash which could occur when adding a cluster alias.
This could be seen as a kernel memory fault in cluaioc_alias or others 
while accessing the inifaddr hash.


PROBLEM: (93677) (PATCH ID: TCR520-176)
********
This patch improves the responsiveness of EINPROGRESS handling during the 
issuing of I/O barriers.  The fix removes a possible infinite loop scenario
which could occur due to the deletion of a storage device.
The issue with EINPROGRESS responsiveness is the continued looping while
waiting for a disk structure to become available.  No attempts were being
made to force the availability of the disk structure.  

In addition, no retry limit was being enforced and no checks were being made 
for deleted devices.  This combination presents the possiblity of infinite
retry attempts.


PROBLEM: (94069, 93505) (PATCH ID: TCR520-178)
********
This patch relieves pressure on the CMS global DLM lock by allowing AutoFS
auto-UNmounts to back off when their lock requests are not granted within a
reasonable amount of time.   This can help avoid turning a transient slowdown
into one which is more persistent.


PROBLEM: (94166, 95439) (PATCH ID: TCR520-187)
********
This patch adds some data validation to code which encodes/decodes token
messages in the cluster, in order to assist in problem isolation and diagnosis.


PROBLEM: (95288) (PATCH ID: TCR520-226)
********
This patch prevents a panic due to "simple_lock: uninitialized lock" during
bootup. Code that was previously added to help diagnose an infrequent problem
with filesystem messages passed between cluster nodes, might cause this panic
between the point at which the node joins the cluster at the CNX level,
and the completion of the code that establishes the node's filesystem
state as part of the cluster  (global variable cfs_set_join_completed 
is still 0).
A typical stack trace is :
    0 boot
    1 panic
    2 simple_lock_fault   
    3 simple_lock_valid_violation 
    4 ckidtokgs
    5 check_cfs_infs      
    6 xdr_cfs_infs        
    7 xdr_cfswriteargs     
    8 xdr_reference
    9 xdr_pointer
   10 xdr_cfswriteargs_p  
   11 icsxdr_decode        
   12 icssvr_decode_xdr
   13 svr_rcfs_write 
   14 icssvr_daemon_from_pool


PROBLEM: (95541, 95368) (PATCH ID: TCR520-243)
********
If a quorum disk is on a parallel scsi bus, bus resets will cause
the quorum disk to be placed into the MUNSA reject state. This will
prevent all IO from occuring to this disk. If the quorum votes are needed
to keep quorum, the votes from the quorum disk will be lost resulting
the cluster hanging.