Jump to page titleUNITED STATES
hp.com home products and services support and drivers solutions how to buy
» contact hp


more options
 
hp.com home
End of Jump to page title
HP Services Software Patches
Jump to content


» software & drivers
» ask Compaq
» reference library
» forums & communities
» support tools
» warranty information
» contact support
» parts
» give us feedback

patches by topic
» DOS
» OpenVMS
» Security
» Tru64 Unix
» Ultrix 32
» Windows
» Windows NT

associated links
» what's new
» contract access
» browse patch tree
» search patch tree
» join mailing list

connection tools
» nameserver lookup
» traceroute
» ping


Find Support Information and Customer Communities for Presario.
Content starts here
HP Services Software Patches - alpshad09_061
 
NOTE:  An OpenVMS saveset or PCSI installation file is stored
       on the Internet in a self-expanding compressed file.
       The name of the compressed file will be kit_name-dcx_vaxexe
       for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha.
 
       Once the file is copied to your system, it can be expanded
       by typing RUN compressed_file.  The resultant file will
       be the OpenVMS saveset or PCSI installation file which
       can be used to install the ECO.
 
Copyright (c) Digital Equipment Corporation 1994, 1996.  All rights reserved.

PRODUCT:    Volume Shadowing for OpenVMS Alpha

OP/SYS:     OpenVMS Alpha

SOURCE:     Digital Equipment Corporation

ECO INFORMATION:

     ECO Kit Name:  ALPSHAD09_061
     ECO Kits Superseded by This ECO Kit:  ALPSHAD07_061
                                           AXPSHAD06_061   (AXPSHAD)
                                           AXPSHAD04_061
                                           AXPSHAD02_061   (CSCPAT_2045)
                                           AXPSHAD01_061
                                           AXPSHAD01_015
     ECO Kit Approximate Size:  5904 Blocks
     Kit Applies To:  OpenVMS Alpha V6.1, V6.1-1H1, V6.1-1H2
     System Reboot Necessary:  Yes

NOTES:  When you install the ALPSHAD09_061 remedial kit you must also
        install the ALPSHAD10_061 or later remedial kit before rebooting
        your system.  Installing the ALPSHAD09_061 kit without installing
        ALPSHAD10_061, or later SHADOW kit, may experience the MERGE
        problem or the SHADZEROMBR bugcheck problem which was resolved
        in the ALPSHAD10_61 remedial kit.

        NOTE:  The ALPSHAD10_61 was placed on engineering hold
               July 10, 1996.  Engineering is researching a
               problem reported with the ALPSHAD10_61.  A replacement
               ECO is scheduled for the near future.

        Future OpenVMS Alpha V6.1 kits that are issued  for  facilities
        included  in  the ALPSHAD09_061 kit will not install unless the
        ALPSHAD09_061 kit is installed on your  system  first.   It  is
        highly recommended that the complete ALPSHAD09_061 remedial kit
        be installed as soon as possible.  Installation  of  individual
        images from the ALPSHAD09_061 remedial kit is not supported and
        could result in unpredictable system behavior.

        If  you  have  a  mixed-architecture  cluster,  and  have   not
        previously installed a shadowing kit, you must install this kit
        on the VAX nodes as well as the Alpha version of  this  kit  on
        Alpha  nodes  of  cluster  BEFORE  you  bring  up both types of
        systems in a cluster again.  If both kits  are  not  installed,
        you may not be able to create shadow sets.

        If you have previously installed a shadowing kit  then  you  do
        not need to install the VAX version of this kit at this time as
        long as the shadowing kit installed on the  VAX  nodes  of  the
        cluster is VAXSHAD04_061 or later.

        Working  configurations  that  contain  SCSI  shadow  sets   on
        dissimilar controllers may no longer work.


ECO KIT SUMMARY:

An ECO kit exists for Volume Shadowing on OpenVMS Alpha V6.1 through
V6.1-1H2.  This kit addresses the following problems:

Problems Addressed in the ALPSHAD09_061 Kit for OpenVMS Alpha V6.1,
V6.1-1H1, V6.1-1H2:

  o  Shadowing crash immediately upon booting system with shadowed system
     disk, in SHSB$READ_SCB.

  o  A two member shadowset with member index 0 a copy target and index 1
     the only source member experiences a node failure on a node serving
     the disks.  The source member goes "available". The source index is
     never PACKACKed (Packet Acknowledgment) and the system remains with
     the set hung in mount verification forever.

  o  If Shadowing tries to mark a block bad on all disks due to it being
     bad on the source(s) and encounters an error it may return an
     incorrect status to the user.  The  status  will be SS$_NORMAL for
     MSCP devices and may be SS$_UNSUPPORTED for non-MSCP devices (as
     determined by routine SHSB$CHECK_MSCP).  An SS$_NORMAL error is
     misleading as it indicates all blocks were correctly marked bad,
     SS$_UNSUPPORTED doesn't seem to be a valid return status for
     shadowing I/Os.

  o  Removing a Disk Copy Data (DCD) copy target and adding it back again
     causes the source of the DCD copy to change.  This can cause the
     copy to be non-assisted if the alternate source isn't on the same
     controller.

  o  If a DCD copy is interrupted by a mini-merge the copy will restart
     at 0% copied (LBN 0) rather than continuing from where it left off.
     DCD copies should restart at the last copied LBN after interrupted
     by mini-merge.

  o  Failures to start copies or restart copies, usually after after a
     node halt, shutdown or reboot.  Additional symptoms observed include
     inconsistent values for HBS_CIP when compared to SHADOW_MAX_COPY,
     negative values for HBS_CIP and copies that should continue started
     over from the beginning.

  o  Demote CMPL to CMPW for #SS$_* to prevent incorrect status handling.

  o  TPU would output SPR text if a user pressed CTRL/C during the
     compile of TPU code that contained errors.  Users often do this when
     they accidentally try to compile non-TPU code or their procedure has
     many coding errors in it.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  If a three member Shadowset has it's index zero member as a copy
     target and all three members also require a MERGE, then when the
     COPY completes the MERGE does not take place.  The LBN for the just
     completed COPY (the last LBN on the disk) is passed as the MERGE
     starting LBN.  So it completes without doing any IO.

  o  When MONITOR is run on a terminal with more than 24 lines, MONITOR
     still uses only 24 lines.  For several classes (PROCESS, DISK, and
     CLUSTER), it would be nice if MONITOR could use the additional
     lines.  This ECO provides support for the PROCESS class - the one
     that could use it most.

     This feature was provided in OpenVMS Alpha V6.2.

  o  Specifying the MONITOR RMS with the  /PERCENT  qualifier  will cause
     MONITOR to unexpectedly terminate with an ACCVIO.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  Specifying the DISK Class to Monitor can result in unexpected side
     effects to the display.  When MONITOR DISK command is issued on a
     system with DFS (DECdfs for OpenVMS Systems) devices mounted, only
     the first three characters of the DFS name are displayed correctly.
     Instead of the fourth character, the low byte of the unit number is
     output.  It is often displayed as an non-printable character or as
     an escape sequence (in which case, may cause terminal lock-ups,
     resetting characteristics, etc).

  o  Due to an inadequate synchronization mechanism, the MONITOR DISK
     or MONITOR CLUSTER command can go into an infinite loop on 
     multi-processor machines.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  When a DCD should be valid to do, it is not always done.  This
     results is doing a non-assisted FULL copy operation which takes much
     longer to do.

  o  Event Flag not set when completion AST also specified on $ENQ.

  o  A problem would occur if a satellite were to crash and then attempt
     to boot back into the cluster (in a SCSI CLUSTER). The physical
     device would be unavailable to the satellite so that it would never
     be allowed to boot back into the cluster.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  On multi-interconnect clusters, there is a window which will allow a
     lock remaster operation to complete without all interested nodes
     pointing to the new master.  This usually results in a number of
     nodes crashing with LOCKMGRERR bugchecks.  The situation is only
     possible after a node CLUEXITs.  Other required conditions are that
     the node which CLUEXITs must have a LOCKDIRWT of zero, such that a
     partial lock rebuild occurs after the CLUEXIT.  If a SS$_NODELEAVE
     error is returned for a node which is to participate in the
     remaster, we must stop the remaster from completing, and allow the
     lock rebuild to clean things up.

  o  A SET SECURITY or SET ACL on volumes on the cluster place High I/O
     on the server process.  This exhausts paged pool and AUDIT_SERVER
     goes into a RWPAG state.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  A field in the IRP that is used during Volume Processing was not
     initialized in clones of USER IOs.  If an error occurs, the code
     that determines the severity of the error can be misled by data in
     these fields.  It can fail to locate the error and return the IO as
     successful.  Since we also return a zero Byte count the User would
     see an Incomplete Segmented Transfer error.  The fix is to
     initialize the field when the clone is allocated.

  o  Listings are sometimes difficult to follow because there are varied
     format conventions used and some comments are misleading or missing.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  Certain applications calling $AUDIT_EVENT with AST's turned off will
     be interrupted when $AUDIT_EVENT returns to caller.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  Code relies on page being present when trying to release spinlock
     and if the system is paging heavily, this might not be the case.

     This problem is corrected in OpenVMS Alpha V6.2.
  o  Repeating wakeups from $SCHDWK show an accumulating drift over time.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  COPY and/or BACKUP of a DISK to a TMSCP-Served TAPE, will fail when
     the tape device is placed in a MV state.  The failure does not occur
     in the same task is performed locally.

     COPY will fail with: "SYSTEM-F-TAPEPOSLOST, magnetic tape position lost".

     BACKUP will fail with:  "-SYSTEM-F-DATALOST, data lost".

     This problem is corrected in OpenVMS Alpha V6.2.

  o  To transition an OpenVMS process from the virtual balance set to the
     real balance set, the SPTE's (system page table entries) which
     describe its process PTE pages (process page table pages) need to be
     copied from saved memory back into the real balance slot from whence
     they originally came.  This makes the process' P0 and P1 space
     accessible again.  SPTE's for the process page table pages
     describing the undefined area between P0 and P1 must be represented
     by pre-initialized null values (actually, ERKW DZERO-type values).
     When this undefined void area is exactly zero pages (i.e., P0 and P1
     are tangent), the VBSS$READ_OPT2_VBSM routine takes the wrong
     branch, causing a VBSSERR bugcheck.  This fix adds a test for this
     case, and takes the image(s) correct branch.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  When a process is switched from a real balance slot to a virtual
     balance slot, the allocation fails, causing a VBSSERR bugcheck.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  When returning process quota (BYTLM) to a process for a created
     system global section compute returned quota value correctly.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  System crashes due to corrupted PTE entries.  The corruption appears
     to be Global Section Table Entries pointing to Global Section
     Descriptors.

     The problem occurs only if 4095 GBLSECTIONS is exceeded.  To check
     the number of Global Sections currently in use add the following
     values:

       o  SDA> VALIDATE QUEUE EXE$GL_GSDSYSFL !global sections

       o  SDA> VALIDATE QUEUE EXE$GL_GSDDELFL !delete pending global sections

       o  SDA> VALIDATE QUEUE EXE$GL_GSDGRPFL !group global sections

  o  Devices can remain allocated to processes that no longer exist.  The
     device remains unusable until the system is rebooted.

  o  If a previously shadowed disk is mounted with a MOUNT/OVER=SHADOW
     command and a new shadow set is created using this disk, OpenVMS
     Alpha will attempt to create the old shadow set using the old
     physical device names.

  o  The system crashes with a NOBVPVCB bugcheck.  The crash occurs on
     the kernel stack with MTAAACP.EXE as the current image.

  o  The system crashes with an XQPERR while dismounting a MAD drive.

  o  SUBTRACED errors not correctly determined for images installed
     /HEADER_RESIDENT.

     This problem is corrected in OpenVMS Alpha V6.2.

  o  When returning process quota (BYTLM) to a process for a created
     system global section compute returned quota value correctly.

  o  Users of RDB V6.1 may get ILLIOFUNC errors when doing IO to a Host
     Based Shadowset whose members are served.

  o  The user will see a large number of the shadow copies being done by
     OpenVMS rather than the controller, even when both disks are on the
     same controller and the controller has DCD capabilities.

  o  If a three member Shadowset has its index zero member as a copy
     target and all three members also require a MERGE, then when the
     COPY completes the MERGE does not take place.  The LBN for the just
     completed COPY (the last LBN on the disk) is passed as the MERGE
     starting LBN.  So it completes without doing any IO.

  o  System hang when I/Os pending to a shadow set do not complete.

  o  In previous shadow kits two new fields were added to the IRP data
     structure for shadow write logging information.  This new IRP
     definition size conflicted with the IRP sizes of other images on the
     system that were not part of the SHADOW kits.  This conflict could
     cause a variety of errors including fatal bugchecks.  This fix
     changes the IRP definitions back to the SSB versions and also adds
     some special definitions to the SHDRIVER for the new IRP fields.

  o  Fatal bugcheck from data structure corruption due to the value 10
     HEX being added to the corrupted field.  Crashes are of various
     types including node and cluster crashes, crashes due to invalid UCB
     addresses, invalid VCB addresses, invalid member IDs, invalid number
     of devices etc.

Problems Addressed in the ALPSHAD07_061 Kit for OpenVMS Alpha V6.1,
V6.1-1H1, V6.1-1H2:

NOTE:  Although this kit contains previous fixes that may be applied
       to OpenVMS Alpha V1.5, beginning with the AXPSHAD06_061 ECO kit,
       there will be no new fixes included for OpenVMS Alpha V1.5.  If
       your system is running OpenVMS Alpha V1.5 and you are experiencing
       the problems listed in the PROBLEMS ADDRESSED IN AXPSHAD06_061 KIT
       FOR OPENVMS AXP V6.1 below, it is strongly recommended that you
       upgrade to OpenVMS Alpha V6.1 as soon as possible.

  o  Fatal bugchecks from data structure corruption may occur due to the
     addition of the value 10 HEX to the corrupted field.  Crashes are of
     various types and include node and cluster crashes, crashes due to
     invalid UCB addresses, invalid VCB addresses, invalid member IDs,
     and invalid number of devices.

  o  There is a race condition possible when a CFCB (Cache File Control
     Block) is being deleted due to XQP action and cache space is being
     reclaimed from a LIMBO file.

  o  Under certain conditions, a fork locks used by the virtual I/O cache
     may be created with an incorrect length.  This results in
     unsynchronized data access which can cause corruption.

  o  When a satellite node in a SCSI cluster crashes, the MSCP server
     marks the physical device as offline which prevents the satellite
     node from being able to boot back into the cluster.

Problems Address in the AXPSHAD06_061 Kit for OpenVMS Alpha V6.1:

  o  Incorrect information in Register 6 and Register 7 causes the system
     to crash with a REGCORDET register corruption bugcheck.

  o  If the system manager fails to set the value of the ALLOCLASS SYSGEN
     parameter and then attempts to use shadowing, a shadow volume can be
     created, but new members cannot be added to the shadow set.  No
     error messages are received until an attempt is made to add a second
     member to the shadow set.  Using the following DCL 'MOUNT' command,
     the following error messages appear:

          $ MOUNT/SYSTEM DSA500 /SHADOW=DKB400 ALPHAVMS015
          %MOUNT-I-SHDWMEMFAIL, DKB400 failed as a member
                                of the shadow set -SYSTEM-F-INCSHAMEM,
                                incompatible shadow set member.

     "Incompatible" is not a true statement of the problem.  It is
     actually due to "missing allocation class," or "incorrect allocation
     class."

  o  I/O to a shadow set may become stalled if a shadow set member is
     dismounted at the same time from multiple nodes within a cluster.

  o  MOUNT will not add shadow set members unless they are either MSCP or
     SCSI.

  o  Shadow set member expulsion is currently based on the time it takes
     for a fork and wait and a PACKACK (Packet Acknowledgment) to
     complete rather than the actual time transpired.  On some devices,
     particularly SCSI devices, where a PACKACK can take approximately
     one minute, the timeout was much too long.  Using the default value
     of 20 (seconds) for SHADOW_MBR_TMO would actually mean that it would
     take 20 minutes to expel a member that is experiencing errors from a
     SCSI shadowset.

  o  SHDRIVER loss of synchronization may result in a crash where SHADDETINCON
     is triggered by the check at the end of MATCH_MASTER_SCB.  In this
     consistency check, the SHAD$W_DEVSTS_PASSIVE_MV_CNTR is verified to
     be zero and is not.  Another symptom is that the virtual unit
     UCB$W_RWAITCNT is zero.  Also shadow set member counts of zero may
     be seen.

  o  Crashes may occur in EXPEL_PACKACK_ANY with connections broken to
     all members and IRP$L_SHD_LOCK_FR5 = 1 (packack retries exhausted).

  o  All members of a shadow set become inaccessible at the same time and
     remain inaccessible for a period of time greater than "shadow member
     timeout" (SHADOW_MBR_TMO  or  SHADOW_SYS_TMO) seconds but less than
     MVTIMEOUT seconds.  All members subsequently become accessible
     within seconds of each other but not at exactly the same time.  This
     results in all but one member being expelled from the shadow set.

     This often occurs when changing HSJ microcode and all members are
     connected to the same HSJ.  When brought back online, polling will
     cause the devices to be found seconds apart which will result in all
     but one member being expelled.

  o  All members of the set must be checked to see if they meet the
     criteria of being MSCP.  The original design did not allow for
     having no index zero member.

  o  In a cluster, using $PROCESS_SCAN explicitly or implicitly with the
     DCL command, SHOW USER, sometimes causes a system crash due to an
     ACCVIO in kernel mode or an IVSSRVRQST bugcheck.

  o  When a node with a SCSI bus boots, it resets the SCSI bus.  In a
     multi-host SCSI cluster, this can cause the other node to experience
     I/O failures.  Normally, this results in a brief mount verification.
     The I/O is retried, succeeds, and there is no serious consequence.
     However, if the other node is in the process of booting and the
     system disk is a shadow set, the system will crash.

  o  PGFIPLHI bugcheck in the SHADOW_SERVER process at the REMQUE in
     K_GET_COPYSHAD_IRP.  On OpenVMS Alpha, the PC is A0E and the VA is 274.

  o  A double-deallocation crash may occur as the result of MOUNT not
     properly initializing the MTL pointer.  This error causes the
     pointer to have a stale value as a result of 2 calls to SYS$VMOUNT
     from a single program.  The problem will not happen as a result of
     DCL commands, since the cells are initialized at image activation.
     The stale pointer will only cause a problem if the system is unable
     to allocate space for defining the logical name.

  o  If a user attempts to mount a disk that is 100% full and the disk
     was originally initialized with a version of OpenVMS Alpha prior to
     the one currently in use, paged pool can be corrupted.  This leads
     to system crashes.  If the disk is filled AFTER it has been mounted,
     there will not be any problem.

  o  Tape devices with stacker/loaders, such as the TF857, may take up to
     6 minutes to Rewind/unload/load the next tape.  A change was made to
     the behavior of MOUNT to take this delay into account.  However, a
     side effect of this change is that non-stacker drives may also wait
     6 minutes before failing.

  o  Processes may hang in RWNPG state while waiting for a request for
     NPP (non-paged pool) so large that it cannot be satisfied.

  o  A system crash may occur with the current process executing a
     $CHKPRO system service call.  This happens when one routine running
     in user mode is interrupted by a KERNEL mode AST which activates a
     routine that uses the same memory.

  o  If a multi-programming application uses a non-homogenous access
     pattern to a file which is resident in Virtual I/O cache, there is a
     possibility that the size returned in the I/O status block from a
     READ operation will be truncated.

     If a clustered application uses of a large number of concurrent
     processes to perform file operations consisting of an OPEN, WRITE,
     and CLOSE sequence repetitively on the same data file, data
     corruption may occur.

     In a multi-programming environment where a significant amount of NEW
     data from a file is being loaded into the cache concurrently by
     multiple processes, the system may HANG.

  o  When a value block or value status block can not be returned,
     SYS$GETLKI returns the error SS$_ILLRSDM.  A correction has been
     made to SYS$GETLKI to now return all other requested information
     and update the wildcard search index.

  o  The Audit Server EXCLUDE process list becomes corrupt after a
     SET AUDIT/EXCLUDE=pid command is issued.

  o  Data corruption may occur in the file container during the use of
     PATHWORKS.  The corruption can be shown by running CHKDSK on the PC
     container disk.  Using PCDISK to IMPORT and EXPORT files to and from
     the container will show corrupted files when EXPORTed back to OpenVMS.


Problems Addressed in AXPSHAD04_061 Kit for OpenVMS Alpha V6.1, V6.1-1H1,
and V6.1-1H2 only:

  o  When booting two or more systems simultaneously from shadowed system
     disks, the systems may appear to hang.  Crashing the systems and
     examining the crash dumps indicates that shadowing driver blocking
     AST routines have not run.

  o  When a node runs out of SHADOW_MAX_COPY threads while mounting new
     copy target units, other nodes in the cluster that have available
     SHADOW_MAX_COPY threads will not pick up the copy work.  This
     results in the copy not being started for copy members that are
     added to shadow sets.

Problems Addressed in AXPSHAD02_061 Kit for OpenVMS Alpha V6.1, V6.1-1H1,
and V6.1-1H2 only:

  o  While running a UETP tape test, fatal controller errors occur.  This
     problem is caused by the incorrect interpretation of a TUDRIVER
     status subcode by TMSCP (the tape server).  After the installation
     of this ECO kit, a fatal controller error status is returned to the
     user when this occurs.

  o  Shadow sets have separate mount verification done by SHDRIVER,
     instead of the usual system mount verification.  The SHDRIVER mount
     verification has an error updating the volume label on shadow sets
     that have the volume label changed except on the node that issues
     the label change.  Once the devices are in this state, they can not
     be recovered until MVTIMEOUT is reached or a reboot of all affected
     nodes is performed.

     This correction enables the behavior of virtual units to be
     consistent with the behavior of physical units.

  o  Unnecessary calls to MOUNT verification or host-based volume
     shadowing processing may occur.  On Alpha nodes, these mount
     verification or Host-Based Volume Shadowing processing calls will
     fail, resulting in I/O hangs and, eventually, volume invalid errors.

  o  AVAILABLE or OFFLINE status returned from a transfer command does
     not implement the MSCP specification correctly.

  o  OpenVMS VAX MSCP Parity with OpenVMS Alpha.  A served disk may
     appear to be ONLINE when it is really OFFLINE.  This occurs because
     the MSCP server's CHECK_SERVICE routine searches the device database
     and incorrectly returns an ONLINE status.

  o  There is no synchronization between SHADOW_PROCESSING and
     INVALIDATE_ALL_ENTRIES, which allows these two code threads to
     run simultaneously.  This can cause a system crash due to the
     fact that the SHADOW_PROCESSING thread may remove a member from
     a multimember shadow set and the INVALIDATE_ALL_ENTRIES thread
     is not aware that the member has been removed.  The system
     crash occurs in RESTORE_WLE because no Write Log table
     exists.

  o  A problem exists with the SHADOW_SERVER.  Several symptoms
     of this problem are:

       +  Undiagnosable hangs in individual copy operations or on
          the entire server

       +  Unexpected copy aborts

       +  Poor copy performance

       +  Shadow set inconsistency

     An optional new system logical name, SHAD$COPY_BUFFER_SIZE, has
     also been added.   This system logical name can be used to control
     the buffer size of shadow copies.  SHAD$COPY_BUFFER_SIZE has a
     maximum size of 127 blocks (default) and a minimum size of 31
     blocks.  The size can be changed by using the DEFINE/SYSTEM
     command.

  o  High interrupt stack activity occurs on a node performing a merged
     copy operation.  This could adversely affect configurations using
     HSJ40 controllers with many shadow sets.

  o  Data inconsistency may exist between members of a Phase II shadow
     set.  This occurs under very heavy I/O operations to a shadow
     set while the members of that shadow set are undergoing failover
     from one controller to another.

  o  Invalid Command status processing of Write History Management
     commands unconditionally puts an entry into the error log.
     This occurs even when there is no actual error.

  o  A second shadow server may accidentally be created using the
     startup command procedure.  This results in desynchronization
     of shadow sets.  The startup procedure has been modified so
     that it does not allow multiple servers.

  o  When a serving node becomes so busy that it occasionally
     exhausts resource limits, the RWAITCNT for heavily used disks
     gets incremented.  If a client node requests on ONLINE and
     RWAITCNT is bumped, it is rejected by MSCP.  This makes
     MOUNTing devices very difficult.

  o  After a system failure, the number of blocks to be rewritten
     is not computed correctly.  This may cause inconsistent data
     between shadow set members.  This occurs during an assisted
     merge when the information regarding which LBNs to include
     is only requested from one shadow set member.

  o  A process issuing I/O to a TMSCP tape device may appear to
     hang after a controller failover attempt.  This is caused by
     an incorrect check of the cached data's lost error status,
     which results in an endless loop trying to recover a
     nonexistent error.

  o  OpenVMS Alpha systems are unable to reboot an MSCP controller,
     such as an HSC.  This might result in stalled pending I/O
     to MSCP or TMSCP devices.

  o  A device may be mounted by an MSCP server, even though a local
     controller could be used.  This situation may still occur after
     the installation of this ECO kit under extreme timing circumstances.

  o  When new MSCP server I/O is sent to a device that is RWAITCNT
     stalled and the connection from the driver to the device fails,
     server I/O is posted to the restart queue if it is active.  If
     not, they are incorrectly left on the UCB (Unit Control Block)
     pending queue.  This causes shadow sets to appear to be stalled.

     If the connection from the client to the server then fails,
     I/O from the client that has been passed to the driver is
     then allowed to complete.  If this I/O is stalled on the
     pending queue, it completes much later, possibly after
     the client has reissued the stalled I/O.

  o  I/O hangs to a shadow set might occur because the shadowing
     driver has no way to disable write logging if the write log
     entries are mismanaged or depleted to a point that the
     shadow set is unusable.

  o  An Invalid Exception bugcheck might occur in DUDRIVER during
     I/O request complete processing.

  o  In the past, MSCP could only serve 256 disks.  It can now
     serve 512.

  o  During disk and tape error recover, MSCP is unable to perform
     a TMSCP controller reset which results in a system crash.

  o  During the processing of a write-log entry in SHDRIVER, a
     register value may be improperly maintained if the system
     is low on nonpaged pool.  This will cause a system crash
     with an INVEXCEPTN Bugcheck within SHSB$GET_WLE_TABLE in
     module SHDSUBS when the entry is resumed.

  o  In the past, Volume Shadowing checked device IDs and the
     maximum logical block numbers (LBNs.)  Volume Shadowing
     now checks for geometries and maximum LBNs.  This
     enables devices like the RZ28 and RZ28B to operate in
     the same shadow set.  Even though their device IDs differ,
     their geometries and maximum LBNs will match when configured
     on like controllers.

     NOTE:  If this remedial kit is installed across a VMScluster
            system, SCSI shadow sets that are configured across
            different controller types are not supported and will
            no longer work.

  o  After approximately 18 hours of operation, some OPCOM
     messages that should be logged are skipped.

  o  If two members of a three-member shadow set are
     simultaneously removed, either intentionally or in
     a failover situation, the system may hang or fail.

  o  System crashes might occur during virtual I/O cache (VIOC)
     expansion under the following circumstances:

       +  Multiple processes (or processors) are accessing the same
          file concurrently;

       +  The cache space for that file was being expanded;

       +  That expansion caused the need for a new hash table
          structure.

  o  When subjected to a high I/O load and multiple failures,
     the write logging (minimerge) and shadowing synchronization
     subsystems become unreliable.

  o  Unreliable shadow subsystem behavior and shadow-set hangs
     result from VMScluster nodes failing to relinquish shadow-set
     resources.

  o  The TMSCP server bugchecks in TMSCP$FIND_UQB when a command
     that refers to a specific unit is processed and that unit
     does not have the Server Local Unit Number (SLUN) bit set.

     The fix contained in this ECO kit will cause the bugcheck
     to occur in TUDRIVER instead of the TMSCP server.

  o  I/O may stall to a served shadow-set member.  Load balancing
     makes this condition more likely.

  o  System crashes may occur during processing of stale I/O in
     Host-Based Volume Shadow Sets.   This I/O does not properly
     reflect changes in shadow set configuration like removal of
     members and changes in the write-logging state.

  o  Shadow set members may be inconsistent after the failure
     of a node accessing a shadow set served by an Alpha node.
     The amount of corrupted data depends on previous I/O
     operations to the shadow set.

Problems Addressed in AXPSHAD01_061 Kit for OpenVMS Alpha V6.1 only:

  o  In Volume Shadowing for OpenVMS Alpha V6.1, minimerge
     functionality across mixed architecture VMSclusters was disabled.
     In order to reestablish the minimerge functionality, install this
     kit across any VMScluster that contains an OpenVMS Alpha V6.1 node.

     After installation of this kit, the entire cluster must be
     rebooted simultaneously.  Rolling upgrades are *NOT* supported.

  o  Mounting an RZ28B disk device with an RZ28 in the same
     shadow set is not allowed and will display the following error:

     %MOUNT-I-SHDWMEMFAIL, $1$DUA0 failed as a member of the shadow set
     -SYSTEM-F-INCSHAMEM, incompatible shadow set member

     This behavior is seen when RZ28/RZ28B shadow set members are
     connected with a local SCSI (Small Computer System Interface)
     controller.

     With this kit, RZ28 and RZ28B devices can be combined in a
     shadow set if they are connected to like controllers.

     NOTE:  If this kit is installed across a VMScluster, SCSI
            shadow sets configured across different controller
            types are not supported and will no longer work.

     VMSclusters with shadowed SCSI disks and mixed-architecture
     VMSclusters running OpenVMS Alpha V6.1 must apply the kit and reboot
     the entire cluster simultaneously, so that the entire VMScluster is
     running the same version of Volume Shadowing software.

  o  In a VMScluster (mixed Alpha/VAX environment), shadow sets served to
     the DEC 3000 Model 300 are reported as MEDOFL.  A DCL command, 'SHOW
     DEVICE/SERVED', from a VAX 6000 Model 400 shows the shadow sets as
     AVAILABLE.


INSTALLATION NOTES:

If you are using the Shadowing option, it is highly recommended that
this kit be installed.

  o  When you install the ALPSHAD09_061 remedial kit you must also
     install the ALPSHAD10_061 or later remedial kit before rebooting
     your system.  Installing the ALPSHAD09_061 kit without installing
     the ALPSHAD10_061 or later kit could lead to system instability.

  o  Future OpenVMS Alpha V6.1 kits that are issued for facilities
     included in the ALPSHAD09_061 kit will not install unless the
     ALPSHAD09_061 kit is installed on your system first.  It is highly
     recommended that the complete ALPSHAD09_061 remedial kit be
     installed as soon as possible.  Installation of individual images
     from the ALPSHAD09_061 remedial kit is not supported and could
     result in unpredictable system behavior.

  o  This kit *MUST* be installed on every Alpha in a mixed-architecture
     VMScluster, and the VAX version of this kit *MUST* be installed on
     every VAX system in the cluster BEFORE any systems are re-booted
     into the VMScluster.  If both kits are not installed, shadow sets
     cannot be created.

  o  Working configurations that contain SCSI shadow sets on dissimilar
     controllers may no longer work.

  o  VMSclusters with shadowed SCSI disks and mixed-architecture
     VMSclusters running OpenVMS Alpha V6.1 must apply the kit and reboot
     the entire cluster simultaneously.  In these cases, rolling upgrades
     are not supported.

For more information, please see the Problem Description section of the
Cover Letter/Release Notes supplied with this kit.
Files on this server are as follows:
»alpshad09_061.README
»alpshad09_061.CHKSUM
»alpshad09_061.CVRLET_TXT
»alpshad09_061.a-dcx_axpexe
privacy statement using this site means you accept its terms