Jump to page titleUNITED STATES
hp.com home products and services support and drivers solutions how to buy
» contact hp


more options
 
hp.com home
End of Jump to page title
HP Services Software Patches
Jump to content


» software & drivers
» ask Compaq
» reference library
» forums & communities
» support tools
» warranty information
» contact support
» parts
» give us feedback

patches by topic
» DOS
» OpenVMS
» Security
» Tru64 Unix
» Ultrix 32
» Windows
» Windows NT

associated links
» what's new
» contract access
» browse patch tree
» search patch tree
» join mailing list

connection tools
» nameserver lookup
» traceroute
» ping


Find Support Information and Customer Communities for Presario.
Content starts here
OpenVMS__SHADOW VAXSHAD09_U2055 VAX V5.5-2__V5.5-2H4 ECO Summary
TITLE: OpenVMS__SHADOW VAXSHAD09_U2055 VAX V5.5-2__V5.5-2H4 ECO Summary
 
Modification Date:  12-JUL-1999
Modification Type:  DOCUMENTATION:  Technical Modification  
                      Added note regarding V5.5-2HW.

NOTE:  An OpenVMS saveset or PCSI installation file is stored
       on the Internet in a self-expanding compressed file.
       The name of the compressed file will be kit_name-dcx_vaxexe
       for OpenVMS VAX or kit_name-dcx_axpexe for OpenVMS Alpha.
 
       Once the file is copied to your system, it can be expanded
       by typing RUN compressed_file.  The resultant file will
       be the OpenVMS saveset or PCSI installation file which
       can be used to install the ECO.
 
Copyright (c) Compaq Computer Corporation 1995, 1999.  All rights reserved.

NOTE:  This ECO kit *CANNOT* be installed on OpenVMS VAX
       V5.5-2HW.  Please upgrade your system to OpenVMS
       VAX V5.5-2 before attempting to install this kit.

       ******************  WARNING  *********************
       *                                                *
       *  **DO NOT** install this ECO kit on OpenVMS    *
       *  VAX V5.5-2HF.  The system will become         *
       *  unbootable.  Sustaining Engineering is        *
       *  currently researching this problem.           *
       *                                                *
       **************************************************

PRODUCT:     Volume Shadowing for OpenVMS  (Phase II)

             NOTE:  The problems fixed in this ECO Kit also affect the
                    following products:

                         VAXcluster Software for OpenVMS VAX
                         VAXcluster Console System (VCS)

OP/SYS:      OpenVMS VAX

COMPONENTS:  System, Bugcheck, Backup,
             Mount, Dismount, MSCP, TMSCP, MTAAACP,
             I/O Routines, Audit Server,
             Security, System Primitives,
             Adaptive Pool Management (APM),
             Operator Communication Manager (OPCOM),
             User Environmental Test Package (UETP)

SOURCE:      Compaq Computer Corporation

ECO INFORMATION:

     ECO Kit Name:  VAXSHAD09_U2055
     ECO Kits Superseded by and Included in this ECO Kit:

          VAXSHADFT9_U2055 (Never Released)
          VAXSHADFT8_U2055 (Never Released)
          VAXSHAD07_061    (For OpenVMS VAX V5.5-2, V5.5-2H4, V5.5-2HF only)
          VAXSHAD06_061
          VAXSHAD05_061
          VAXSHAD04_061
          VAXSHAD03_061
          VAXSHAD01_061    (CSCPAT_1160)
          VAXSHAD02_060    (CSCPAT_1116)
          VAXSHAD01_060    (CSCPAT_1116)
          VAXSHAD08_U2055  (CSCPAT_0269, CSCPAT_1160)
          VAXSHAD07_U2055  (CSCPAT_0269, CSCPAT_1160)
          VAXSHAD05_U2055  (CSCPAT_0269, CSCPAT_1160)
          VAXSHAD04_U2055  (CSCPAT_0269, CSCPAT_1160)
          VAXSYS14_061     (For OpenVMS VAX V5.5-2, V5.5-2H4, V5.5-2HF only)
          VAXSYS16_U2055
          VAXSYS15_U2055
          VAXSYS14_U2055
          VAXSYS13_U2055
          VAXSYS12_U2055
          VAXSYS11_U2055
          VAXSYS10_U2055
          VAXSYS09_U2055
          PRCMGT$01_U2055
          VAXSYS01_2H4055  (CSCPAT_1094)
          VAXSYSL04_U2055
          VAXMONT01_061 (For OpenVMS VAX V5.5-2, V5.5-2H4, V5.5-2HF only)
          VAXMONT03_U2055
          VAXMONT02_U2055
          VAXMOUN05_U2055 (CSCPAT_1152)
          VAXMOUN04_U2055 (CSCPAT_0240)
          VAXMOUN03_U2055 (CSCPAT_0240)
          VAXMSCP08_U2055 (CSCPAT_1120)
          VAXMSCP07_U2055
          VAXMSCP05_U2055 (CSCPAT_1068)

     ECO Kit Approximate Size:  2952 Blocks
     Kit Applies To:  OpenVMS VAX V5.5-2, V5.5-2H4

     NOTE: OpenVMS VAX V5.5-2H4 is a limited hardware release,
           shipped only with the new systems (or system upgrades)
           listed below. It is not separately orderable and will not
           be distributed via Consolidated Distribution.

           o VAX 4000 Model 100A
           o VAX 4000 Model 500A
           o VAX 4000 Model 600A
           o VAX 4000 Model 700A

     System/Cluster Reboot Necessary:  Yes


                           *** WARNING!!! ***

     Future OpenVMS VAX V5.5-2 kits that are issued for facilities
     included in the VAXSHAD09_U2055 kit will not install unless the
     VAXSHAD09_U2055 kit is installed on your system first.  It is
     highly recommended that the complete VAXSHAD09_U2055 remedial kit
     be installed as soon as possible.  Installation of individual
     images from the VAXSHAD09_U2055 remedial kit is not supported and
     could result in unpredictable system behavior.

     Descriptions for problems that were corrected in previous VAX
     Shadow kits are included in the VAXSHAD09_U2055 Release Notes.
     The Release notes can be found in the VAXSHAD09_U2055.A save
     set.  If you have not installed a previous shadow kit,
     it is recommended that you read these release notes before
     installing the VAXSHAD09_U2055 Shadow kit.  To access the release
     notes, restore them from the saveset by issuing a command with the
     following format:

        $ BACKUP/SEL=VAXSHAD09_U2055.RELEASE_NOTES -
        $_DEVICE:[DIR]VAXSHAD09_U2055.A/SA -
        $_DEVICE:[DIR]VAXSHAD09_U2055.RELEASE_NOTES

     If you have a mixed-architecture cluster, and have  not
     previously installed a shadowing kit, you must install this kit
     on the VAX nodes as well as the applicable Alpha version of
     this kit on Alpha nodes of cluster BEFORE you bring up both
     types of systems in a cluster again.  If both kits are not
     installed, you may not be able to create shadow sets.

     If you have previously installed a shadowing kit then you do
     not need to install the Alpha version of this kit at this time
     as long as the shadowing kit installed on the Alpha nodes of
     the cluster is ALPSHAD04_061 or later.

     Working configurations that contain SCSI shadow sets on
     dissimilar controllers may no longer work.


CAUTION:

Before Installing this Kit, Read the Following Cautions:

     After installation of this kit, the following issues may occur:

      1)  ISSUE:  When a node reboots into the cluster there may not
                  be an OPCOM message that reports the node is joining
                  the cluster.  Absent messages occur on a random
                  basis.

          WORKAROUND:  In order to verify the node has entered the
                       cluster, after the node has fully rebooted, the
                       user should enter the command:

                            $ SHOW CLUSTER

                       to verify the node is a valid member of the
                       VAXcluster.

      2) ISSUE FROM THE CSC:  An INVEXCEPTN in SNDRIVER may be seen if
                              DECnet/SNA V2.1 is used in conjunction
                              with the IO_ROUTINES from the VAXSHAD
                              ECO kit.  SNAVMS_E04021 (CSCPAT_5041) will
                              fix this problem by replacing the
                              incompatible SNDRIVER in DECnet/SNA V2.1

                              NOTE: SNAVMS_E04021 applies to
                                    DECnet/SNA V2.1 only.


      3)  After installation of this ECO kit on an OpenVMS VAX
          V5.5-2x system, MAIL, REPLY, or any process that uses
          the $BRKTHRU system service may hang in MUTEX waiting for
          more BYTLM that it needs or has available.  Looking at the
          process from SDA will show it is waiting for significantly
          more BYTLM (R1) than the process has left (BUFIO byte
          count/limit), and significantly more than the message it
          is trying to output.

          The workaround for this is to make BYTLM larger.

      4)  This ECO kit should *NOT* be installed on an FT810
          system running OpenVMS VAX V5.5-2HF.  If it is installed,
          the system will not reboot.

     These issues are being addressed and will be corrected in a
     future version of OpenVMS VAX.


ECO KIT SUMMARY:

An ECO kit exists for Volume Shadowing on OpenVMS VAX V5.5-2 and
V5.5-2H4.

Problems Addressed in the VAXSHAD09_U2055 kit:

  o  A 'SET SECURITY' or 'SET ACL' command issued on volumes in a
     cluster places high I/O on the server process.  This exhausts
     paged pool and the AUDIT_SERVER goes into an RWPAG state.

     This problem is corrected in OpenVMS VAX V6.2.

  o  A field in the IRP that is used during Volume Processing is not
     initialized in clones of USER IOs.  If an error occurs, the
     code that determines the severity of the error can be misled by
     data in these fields.  The code can fail to locate the error
     and may return the IO as successful.  Since a zero byte  count
     is also returned, a user would see an Incomplete Segmented
     Transfer error.  The fix is to initialize the field when the
     clone is allocated.

   o  While creating a page, a user process may be swapped out and
      returned with a different balance set slot.

      This problem is corrected in OpenVMS VAX V6.2.

  o  Listings may be difficult to read due to varied formats and
     misleading or missing comments.

     This problem is corrected in OpenVMS VAX V6.2.

  o  Certain applications that call $AUDIT_EVENT with ASTs turned
     off will be interrupted when $AUDIT_EVENT returns to the
     caller.

     This problem is corrected in OpenVMS VAX V6.2.

  o  The code relies on a page being present when it attempts
     to release a spinlock.  If the system is paging heavily,
     the page may not be available.

     This problem is corrected in OpenVMS VAX V6.2.

  o  Repeating wakeups from $SCHDWK show an accumulating drift over
     time.

     This problem is corrected in OpenVMS VAX V6.2.

  o  COPY and/or BACKUP of a DISK to a TMSCP-Served tape, will fail
     when the tape device is placed in an MV state.  The failure
     does not occur if the same task is performed locally.

     COPY will fail with:  "SYSTEM-F-TAPEPOSLOST, magnetic tape
                            position lost"

     BACKUP will fail with: "-SYSTEM-F-DATALOST, data lost"

     This problem is corrected in OpenVMS VAX V6.2.

  o  To transition an OpenVMS process from the virtual balance set
     to the real balance set, the SPTEs (system page table entries)
     which describe its process PTE pages (process page table pages)
     need to be copied from saved memory back into the real balance
     slot from where they originally came.  This makes the process'
     P0 and P1 space accessible again.  SPTEs for the process page
     table pages describing the undefined area between P0 and P1
     must be represented by pre-initialized null values (actually,
     ERKW DZERO-type values).  When this undefined void area is
     exactly zero pages (i.e., P0 and P1 are tangent), the
     VBSS$READ_OPT2_VBSM routine takes the wrong branch, causing a
     VBSSERR bugcheck.  This fix adds a test for this case, and
     takes the image's correct branch.

     This problem is corrected in OpenVMS V6.2.

  o  When a process is switched from a real balance slot to a
     virtual balance slot, the allocation may fail.  This causes
     a VBSSERR bugcheck.

     This problem is corrected in OpenVMS VAX V6.2.

  o  The quota value may be incorrect when process quota
     (bytlm) is returned to a process for a system global
     section.

     This problem is corrected in OpenVMS VAX V6.2

  o  System crashes may occur due to corrupted PTE entries.  The
     corruption appears to be Global Section Table Entries pointing
     to Global Section Descriptors.

     The problem occurs only if 4095 GBLSECTIONS are exceeded.  To
     check the number of Global Sections currently in use, add the
     following values:

       -  SDA> VALIDATE QUEUE EXE$GL_GSDSYSFL    !global sections

       -  SDA> VALIDATE QUEUE EXE$GL_GSDDELFL    !delete pending global
                                                  sections

       -  SDA> VALIDATE QUEUE EXE$GL_GSDGRPFL    !group global sections

  o  Devices can remain allocated to processes that no longer
     exist.  The device remains unusable until the system is
     rebooted.

  o  If a previously shadowed disk is mounted with a MOUNT/OVER=SHADOW
     command and a new shadow set is created using this disk, OpenVMS
     VAX will attempt to create the old shadow set using the old
     physical device names.

  o  The system may crash with a NOBVPVCB bugcheck.  The crash occurs
     on the kernel stack with MTAAACP.EXE as the current image.

  o  The system may crash with an XQPERR bugcheck while dismounting
     a MAD drive.

  o  SUBTRACED errors are not correctly determined for images
     installed /HEADER_RESIDENT.

     This problem is corrected in OpenVMS VAX V6.2.

  o  Users of ORACLE [R] Rdb V6.1 may get ILLIOFUNC errors when
     performing IO to a Host-Based Shadowset whose members
     are served.

  o  The user will see a large number of the shadow copies being
     done by OpenVMS rather than the controller, even when both
     disks are on the same controller and the controller has DCD
     (Disk Copy Data) capabilities.

  o  If a three-member Shadowset has its index zero member as a copy
     target and all three members also require a merge, then when
     the copy completes, the merge does not take place.  The LBN for
     the just completed copy (the last LBN on the disk) is passed as
     the MERGE starting LBN, so it completes without doing any IO.

  o  Failures to start copies or restart copies may occur, usually
     after a node halt, shutdown or reboot.  Additional symptoms
     observed include inconsistent values for HBS_CIP when compared
     to SHADOW_MAX_COPY, negative values for HBS_CIP and copies
     that should continue started over from the beginning.

  o  System hangs occur when IOs that are pending to a shadow set
     do not complete.

  o  UCB$L_MAXBCNT appears to be invalid for a shadowed disk.


Problems addressed in the VAXSHAD07_061 Kit:

  o  In the VAXSHAD05 and VAXSHAD06 kits two new fields were added
     to the IRP data structure for shadow write logging information.
     This new IRP definition size conflicts with the IRP sizes of
     other images on the system that are not part of the SHADOW kits.
     This conflict may cause a variety of errors, including fatal
     bugchecks.  This fix changes the IRP definitions back to the SBB
     versions and adds some special definitions to the SHDRIVER for
     the new IRP fields.

  o  Fatal bugchecks from data structure corruption may occur due
     to the addition of the value 10 HEX to the corrupted field.
     Crashes are of various types and include node and cluster
     crashes, crashes due to invalid UCB addresses, invalid VCB
     addresses, invalid member IDs, and invalid number of devices.


Problems Addressed in the VAXSHAD06_061 Kit:

  o  When using PATHWORKS, data corruption may occur on the file
     container.  The corruption can be seen by running CHKDSK on the PC
     container disk.  Also using PCDISK to IMPORT and EXPORT files to
     and from the container will show a corrupted file when EXPORTed
     back to VMS.

  o  System crashes occur with INVEXCEPTN bugcheck at
     SCH$POSTEF+21.

     To correct this problem, a change was made in the IOC$SIMREQCOM
     routine to cause the destination of the IFNOWET test to
     initialize R4 before calling the IOC$SCHEDEF routine.
     IOC$SCHEDEF expects R4 to have the address of the user's PCB.


Problems Addressed in the VAXSHAD05_061 Kit:

  o  After a node crashes, it cannot mount a Host-Based Volume
     Shadowing virtual unit on reboot.  The error message usually
     returned is "volume not software enabled"; however, "Medium
     Offline" may also be seen.  A SHOW DEVICE will show that the
     the Shadowset is in 0% merge but SDA will show that a minimerge
     is pending.

  o  A double deallocation crash may occur as the result of MOUNT
     not properly initializing the Mounted Volume List  pointer.
     This pointer may have a stale value as a result of two calls to
     SYS$VMOUNT from a single program.  The stale pointer will only
     cause a problem if the system is unable to allocate space for
     defining the logical name.

     NOTE:  Since cells are initialized at image activation, this
            problem should not occur as a result of DCL commands.

  o  Tape devices with stacker/loaders, such as the TF857, may take
     up to 6 minutes to rewind/unload/load the next tape.  In
     VAXSHAD01_061, a change was made to the behavior of MOUNT to
     take this delay into account.   However, a side effect of that
     change was that non-stacker drives may also wait 6 minutes
     before failing.

  o  A system may crash with an INVEXCEPTN during an SHDRIVER
     COPY_DATA_REPAIR copy operation.

  o  If the value of the ALLOCLASS SYSGEN parameter is not set and
     the user tries to use shadowing, a shadow volume can be created
     but members cannot be added to the shadow set.  No error
     messages are received up until a second member is added.  On
     the MOUNT command, the customer will receive the error
     messages:

        $   mount   /system    dsa500    /shadow=dkb400    alphavms015
        %MOUNT-I-SHDWMEMFAIL,  DKB400 failed as a member of the shadow set
        -SYSTEM-F-INCSHAMEM, incompatible shadow set member

     "Incompatible" is an inappropriate statement of the problem.  A
     more accurate message would be "missing allocation class," or
     "incorrect allocation class."

  o  If a shadow set member is dismounted at the same time from
     multiple nodes within a cluster, I/O to that shadow set may
     become stalled.

  o  Mount will not add shadow set members unless they are either
     MSCP or SCSI.

  o  Shadow set member expulsion was based on the time it took a
     fork & wait and a PACKACK to complete rather than the actual
     time transpired.  On some devices, particularly SCSI, where a
     PACKACK can take approximately one minute, the timeout was much
     too long. Using the default value of 20 (seconds) for
     SHADOW_MBR_TMO would actually mean that it would take 20
     minutes to expel a member experiencing errors from a SCSI
     shadow set.

  o  SHDRIVER loss of synchronization may result in a crash where
     SHADDETINCON is triggered by the check at the end of
     MATCH_MASTER_SCB.  In this consistency check, the
     SHAD$W_DEVSTS_PASSIVE_MV_CNTR is verified to be zero and is not.
     Another symptom is that the virtual unit UCB$W_RWAITCNT is
     zero.  Shadow set member counts of zero may also be seen.

  o  Crashes may occur in EXPEL_PACKACK_ANY with connections broken to
     all members and IRP$L_SHD_LOCK_FR5 = 1 (packack retries exhausted).

  o  All members of a shadow set become inaccessible at the same time and
     remain inaccessible for a period of time greater than "shadow
     member timeout" (SHADOW_MBR_TMO or SHADOW_SYS_TMO) seconds but
     less than MVTIMEOUT seconds.  All members subsequently become
     accessible within seconds of each other but not at exactly the same
     time.  This results in all but one member being expelled from the
     shadow set.

     This often occurs when changing HSJ microcode and all members are
     connected to the same HSJ.  When brought back online, polling will
     cause the devices to be found seconds apart which will result in
     all but one member being expelled.

  o  All members of a shadow set must be checked to see if they meet
     the criteria of being MSCP.  The original design did not allow
     for having no index zero member.

  o  When the mounting of full copy targets exceeds the SHADOW_MAX_COPY
     threads for a given node, other nodes with the shadow set mounted
     do not pick up the copy work.

  o  In a cluster, using $PROCESS_SCAN explicitly or implicitly with the
     DCL 'SHOW USER' command sometimes causes a system crash due to an
     ACCVIO in kernel mode or an IVSSRVRQST bugcheck.

  o  When a node with a SCSI bus boots, it resets the SCSI bus.  In a
     multi-host SCSI cluster, this can cause the other node to
     experience I/O failures.  Normally, this results in a brief mount
     verification.  The I/O is retried, succeeds, and there is
     no serious consequence.  However, if the other node is in the
     process of booting and the system disk is a shadow set, the
     system will crash.

  o  A PGFIPLHI bugcheck may occur in the SHADOW_SERVER process at
     the REMQUE in K_GET_COPYSHAD_IRP.  On OpenVMS VAX, the PC is
     A0E and the VA is 274.

  o  A page setup module which draws a frame and company logo on each
     page of output is used on a queue pointing to an LN03.  This page
     setup module works on OpenVMS Version VAX 5.5-2 and prior versions.
     However, with VAXQMAN8_U2055 (CSCPAT_1165) or OpenVMS VAX Version
     6.1 installed, this page setup module causes the printer to
     continually spew out paper with only the output from the page setup
     module.  This continues until the entry is deleted from the queue.

  o  Due to an inadequate synchronization mechanism, the MONITOR DISK
     command can go into an infinite loop on multi-processing machines.

  o  A race condition may occur in a VMScluster.  This happens
     most frequently on clusters where the 'SET AUDIT/SERVER=NEW'
     command is issued repeatedly.  The race condition presents
     itself as one or more of the audit servers within the
     cluster continuing to use the old audit journal rather
     than using a newly created journal.

  o  A system may crash with a PGFIPLHI bugcheck with a "PAGE FAULT
     at IPL too high" error message.


Problems Addressed in the VAXSHAD04_061 Kit:

  o  When booting two or more systems simultaneously from shadowed
     system disks, the systems may appear to hang.  Crashing the
     systems and examining the crash dumps indicates that shadowing
     driver blocking AST routines have not run.

  o  When a node runs out of SHADOW_MAX_COPY threads while mounting
     new copy target units, other nodes in the cluster that have
     available SHADOW_MAX_COPY threads will not pick up the copy
     work.  This results in the copy not being started for copy
     members that are added to shadow sets.


Problems Addressed in the VAXSHAD03_061 Kit:

  o  A double-deallocation crash may occur as the result of MOUNT not
     properly initializing the MTL pointer.  This pointer had a stale
     value as a result of 2 calls to SYS$VMOUNT from a single program.
     The problem will not happen as a result of DCL commands, as the
     cells are initialized at image activation.  The stale pointer
     will only cause a problem if the system is unable to allocate
     space for defining the logical name.

  o  OPCOM message was being output even though /NOASSIST was
     specified in the MOUNT command.  This caused problems for UETP.

  o  When booting from a Controller-Based System disk for the first
     time as a Host-Based System disk, boot fails and a SHADBOOTFAIL
     Bugcheck occurs.  A SHADBOOTFAIL will also occur if the
     SHADOW_SYS_UNIT is changed at boot time.

  o  During a copy operation the system may crash with an ACCVIO.

  o  Reduce the volume of messages printed during SHDRIVER volume
     processing to make the messages that are printed more
     meaningful to the user.  This involves minor modifications to
     SHDRIVER to suppress messages that do not indicate actual
     problems.  No messages have been modified, deleted, or changed.
     Only the frequency with which they are printed has changed.

  o  The path selection logic for DUDRIVER had a timing problem that
     caused devices to be mounted by an MSCP server, even though a
     local controller could be used.  Although this symptom could
     still appear under extreme circumstances, the majority of devices
     should now find the local controller.

  o  In a large LAVC (Local Area VAXcluster) after one or more nodes
     leave the cluster, state transition times can be excessive and,
     the following messages may be repeatedly sent to the consoles of
     the various nodes:

          %CNXMAN, proposing reconfiguration of the VAXcluster

          %CNXMAN, aborting VAXcluster state transition

     The state transition, which normally should complete within 1-3
     seconds, instead may take 15-55 seconds or more.

  o  Incorrect MSCP-served disk synchronization, would cause I/O to an
     MSCP-served disk to get stalled on an internal queue and later
     restarted.

  o  An internal routine, MOVE_SERVER, had a sequencing problem and
     could cause stalled I/O to a served shadow-set member.

  o  MSCP server crashes may occur in large clusters.


Problems Addressed in the VAXSHAD01_061 Kit:

  o  A delay of up to six minutes can occur before a
     device-not-ready condition is reported during cartridge volume
     switching on non-SCSI (Small Computer System Interface)
     TX867-type devices.

  o  Some of the OpenVMS VAX console executive messages have changed
     to mixed upper and lower case letters for OpenVMS VAX V6.0
     message text.  The result is that current VCS scan files will not
     match the console text, and VCS alarms will fail to trigger.
     (Please see the ECO kit release notes for more information and
     instructions regarding this fix.)

  o  There is no synchronization between SHADOW_PROCESSING and
     INVALIDATE_ALL_ENTRIES, which allows these two code threads to
     run simultaneously.  This can cause a system crash due to the
     fact that the SHADOW_PROCESSING thread may remove a member from
     a multimember shadow set and the INVALIDATE_ALL_ENTRIES thread
     is not aware that the member has been removed.  The system
     crash occurs in RESTORE_WLE because no Write Log table
     exists.

  o  When shadow set members are not available for SHADOW_MBR_TMO
     seconds, they should be expelled from the shadow set.
     Sometimes when two members of a three-member set enter this
     condition, only one member will be successfully removed.  The
     other member will not be removed, and this will cause the
     virtual unit to hang until the errant member returns or it is
     manually removed from the set via push button.

  o  In Volume Shadowing for OpenVMS Alpha Version 6.1, several
     changes were made to the assisted merge (minimerge)
     functionality.  These changes disabled mimimerge functionality
     across mixed architecture VMSclusters.  With minimerge
     disabled, shadowing continued to function normally, except that
     a full merge was always done when a merge operation occurred.
     Full merges take considerably longer than minimerges.   If
     minimerge functionality is desired, Digital recommends that
     this kit be installed across any VMSclusters that contain an Alpha
     node running OpenVMS Alpha Version 6.1.

     Mixed-architecture VMSclusters that are running OpenVMS Alpha
     Version 6.1 must apply this kit and reboot the entire cluster
     simultaneously.  In these cases, rolling upgrades are not
     supported.

  o  Prior to this remedial kit, if attempts were made to mount an
     RZ28B disk device with an RZ28 in the same shadow set, Volume
     Shadowing detected different device IDs and may not have
     allowed the devices to be mounted.  This behavior applied only
     an RZ28/RZ28B shadow-set combination when connected with a
     local SCSI controller.  Since RZ28 and RZ28B are different
     device types but can be shadowed, the checking for shadow-set
     membership  in  the host-based shadowing software needed to be
     modified.

     This remedial kit enables the combination of RZ28 and RZ28B
     devices in a shadow set, as long as they are connected to like
     controllers.  With the use of SCSI devices, like controllers
     are required because geometry can vary from controller to
     controller.   Digital recommends that SCSI  shadow sets be
     configured across like controller types. Existing SDI and DSSI
     configurations are unaffected; if they are not using SCSI
     drives and are shadowing SDI devices across different
     controllers, these configurations will  continue  to work
     without this remedial kit.

     VMSclusters with shadowed SCSI disks and mixed-architecture
     VMSclusters running OpenVMS Alpha Version 6.1 must apply the
     kit and reboot the entire cluster simultaneously, so that the
     entire VMScluster is running the same version of Volume
     Shadowing software.  The kit is required for both VAX and Alpha
     nodes.   Do not mount shadow sets containing RZ28 and RZ28B
     devices without first applying this kit.

  o  If a Shadowset Virtual Unit is dismounted during a full copy,
     the full copy target's SCB is incorrectly written.  This allows
     a subsequent mount of that shadow set member to succeed as if
     the copy had completed.

  o  System crashes may occur in RESTORE_WLE because there is no
     Write Log table.  In fact, a member has been removed from the
     set.  This problem is similar but different from the DU/SH
     Synch problem that causes the same symptom.

  o  When members are not available for SHADOW_MBR_TMO seconds, and
     other members are available, the unavailable members should be
     ejected from the shadow set.  In certain configurations, with
     the current version of the driver, should two members of a
     three member set enter this condition, only one member will be
     successfully removed.  The other member will not be removed
     and the virtual unit will hang until the errant member returns,
     or it is manually removed from the set via push button.

     This behavior has been fixed in this kit.  Any members that
     remain unavailable for greater than SHADOW_MBR_TMO seconds will
     be fully expelled from the set.

  o  Device not ready for magtapes was not reported until a delay of
     up to 6 minutes expired.


Problems Addressed in the VAXSHAD02_060 Kit:

  o  A SHADDETINCON was caused by the X-64A1 check in, because the
     wrong GPR was used when an unlikely system address was stored
     into the IRP.


Problems Addressed in the VAXSHAD01_060 Kit:

  o  If SHDRIVER encounters a situation where more than one member
     of a three member shadow set go into error recovery at the same
     time, and they cannot be brought back into the shadow set
     (i.e., loss of connectivity, media offline, write locked
     device, etc.) SHDRIVER will expel one of the members and crash
     with a SHADDETINCON when it cannot update the SCB on the
     remaining members.

  o  When all shadow set members are write locked, a bugcheck will
     occur due to R4 being destroyed across a JSB to
     SHSB$GET_CLEAN_IRP.  This fix preserves that register.

  o  The SHADOW_MAX_COPY SYSGEN parameter is used to set how many
     merge/copy threads may be started at the same time on a node.
     This was not working.  Systems would start more than
     SHADOW_MAX_COPY number of threads.

  o  SHdriver system disk member timer issues and R2/R5 corruption
     problems:

     1.  The SHSB$MATCH_MASTER_SCB routine makes improper use of
         SHSB$PAUSE.  The use of SHSB$PAUSE causes the SHAD (in R2)
         not to be preserved when the time delay is invoked
         (since it forks), so the resulting value in R2 is
         indeterminate.

     2.  The SHSB$MATCH_MASTER_SCB routine makes improper use of
         SH$TIME_DELAY.   An input requirement of SH$TIME_DELAY is
         to have a UCB in R5.

     3.  The SH$ABORT_VP routine makes improper use of
         SH$TIME_DELAY.  The use of SH$TIME_DELAY causes the SHAD
         (in R2) not to be preserved when the time delay is invoked
         (since it forks), therefore the resulting value  in R2 is
         indeterminate.

     4.  The benefit of reassembling a multiple member system disk
         shadow set is lost to some configurations if the current
         fixed amount of time expires and all of the former members
         of the shadow set are not available.  This has caused
         escalations to be raised to address this specific behavior.
         Second, enable the differentiation of the member time
         out time for system disk versus other disks.  Last,
         make the currently hardcoded wait of FF seconds to
         connect to all members of an existing system disk a
         user-controlled variable.

  o  SHdriver MVTIMEOUT after member error and R5 corruption
     problems:

     1.  The spontaneous removal of one shadow set member of a multiple
         member set due to a fatal error causes some cluster nodes to
         hang the virtual unit until the MVTIMEOUT time expires.


     2.  In SHSB$VALIDATE_SHADOW_SET, the wait loop at 130$ does not
         correctly restore the contents of R5 to be the VU VCB
         after a call to SHSB$PAUSE.

  o  WLE_POST_PROC is not done on all the clones.  This causes
     allocation of new unnecessary Write Log Entries.  The Write Log
     INUSE bit is never cleared so the table has to be expanded.
     Once the table expands to MAX, Write Logging is disabled.  When
     Write Logging gets turned back on it starts all over.  All the
     entries in the controller will be exhausted forcing Write Log
     Exhaustion handling and in some cases the controller will be
     reset.

  o  While doing INVALIDATE_ALL_ENTRIES if the READ of LBN #1 fails
     or WLG has been turned off, a branch goes to the wrong
     location. This results in issuing an IO with no READY clones
     and the system will wait forever with SEQCMD lock held and
     RWAITCNT bumped.


Problems Addressed in the VAXSHAD08_U2055 kit:

  o  After installation of CSCPAT_0269 V2.7 (VAXSHAD07_U2055), the
     system may crash with a SHADDETINCON bugcheck.  The bugcheck
     occurs when a disk is removed from a mounted shadow set.


Problems Addressed in the VAXSHAD07_U2055 kit:

  o  Write Log Usage fixes:

     1.  The first problem symptom is that user I/O to a virtual
         unit may intermittently hang on any node (usually only
         one) that has a multiple member virtual unit mounted.  The
         hang can occur with no other overt error symptoms evident
         in either the error log or as seen by analyzing the live
         system.

     2.  The second problem symptom is less apparent, in that the
         resources used for the write history management function
         are managed in a more efficient manner.

  o  When one member of a multiple-member shadow set encounters a
     fatal device error, the node that discovers the initial problem
     will successfully expel that device from the set.  However,
     other nodes that are under heavy I/O loads when the device is
     expelled may occasionally fail to recover the full membership.
     This will cause the virtual unit to hang until the MVTIMEOUT
     time limit is reached.


Problems Addressed in the VAXSHAD05_U2055 Kit for OpenVMS VAX V5.5-2

  o  A documentation change was made to the VAXSHAD04_U2055
     kit to remove an incorrect reference.


Problems Addressed in the VAXSHAD04_U2055 Kit for OpenVMS VAX V5.5-2:

  o  When a host receives a controller error, Volume Shadowing Phase
     II processing removes whatever device is at SHAD index 0 even
     if this member was not the one that experienced the controller
     error.  Once the index 0 member is gone, all other controller
     errors are ignored.

  o  The ability to switch to the current master member of a system
     disk shadow set has a limited configuration of
     controller/adapter types.  Crash dumps that were correctly
     written (according to console output) cannot be found for
     analysis when using an HBVS multiple-member system disk shadow
     set.

  o  Applications can hang or experience I/O transfer errors when
     using multiple-member shadow sets that are connected in such a
     way that segmented I/O transfers are needed.  This has been
     reported on systems running WordPerfect[TM].

  o  AN INVEXCEPTN crash can occur if the allocation of a clone
     chain fails to successfully allocate.  If a
     FANOUT_ALLOCATION_XXX request fails, the MIRP is still linked
     to the active queue which causes the next REMQUE to fail.

  o  If a system that currently holds the WATCHER lock crashes while
     it is validating the status of a Host-Based Volume Shadow Set
     that is mounted cluster-wide and another node assumes the WATCHER
     lock, an IPL 8 system hang can occur.

  o  A SYSDUMP.DMP file that appears to be written correctly can be
     invalid when the boot device and the master member of the system
     disk shadow set diverge.  The device that the system dump is
     written to has always been the boot device.  The SHDRIVER.EXE in
     this kit allows the system dump to be written to a member of the
     system disk shadow set other than the boot device.  Upon successful
     write completion, the unit number will be displayed on the console.

  o  The VAX 7000 had been restricted to using the boot device as the
     only valid dump device in a prior remedial image.  Additionally,
     proper operation was not allowed at shutdown or when a crash dump
     needed to be written because an incorrect message was sent
     concerning the path to the system disk.

  o  Occasionally, all of the former members of a system disk shadow
     set will not return upon a system reboot.  This problem will
     occur only if the virtual unit is not otherwise mounted in the
     cluster at boot time.

  o  Under certain conditions, once a virtual unit exceeds the mount
     verify time-out time, the correct behavior is not accomplished.
     Indeterminate behavior occurs due to use of a corrupted SHAD
     pointer because fork context requirements are not observed.

  o  If a node is booting into a cluster and the boot device being
     used is already mounted in the cluster as a member of a virtual
     unit with a different virtual unit number, the node is
     incorrectly allowed to continue to boot into this cluster.

  o  Under certain circumstances, the SHADOW_MAX_COPY SYSGEN
     parameter does not regulate the number of copies a particular
     node will control.  This effectively nullifies the significance
     of setting any value in SHADOW_MAX_COPY.

  o  Configurations that consume a great number of event flags and
     create a large number of multiple-member shadow sets (i.e.,
     greater than 50) may experience a system crash.

  o  Inadvertent placement of the SCB (System Control Block) can
     adversely affect the best time calculation needed for a full
     merge operation.  This will, in turn, adversely affect the
     total time it takes to perform the full merge operation.

  o  If enough write I/O operations to cause the write log table to
     go beyond its expansion limit of 4K are issued to a
     multiple-member shadow set that has write logging enabled, the
     set may hang. This condition can occur with no evident error
     symptoms.

  o  If a member of a three-member shadow set loses its connection
     for SHADOW_MBR_TMO time, a decision to remove that member is
     initiated.  Should either of the remaining members not be able
     to complete an SCB (System Control Block) update, the removal
     operation may occasionally result in a SHADDETINCON crash.

  o  When a multiple-member system disk shadow set is in use and a
     number of nodes are rebooted at the same time, sometimes the
     path to one of the non-boot device members is used before it
     has been properly initialized.  This causes a race condition
     which may result in an MSCPCLASS bugcheck.


Problems Addressed in the VAXSYS14_061 Kit:

  o  There is a race condition that may occur when a CFCB (Cache File
     Control Block) is being deleted due to XQP action and cache
     space is being reclaimed from a LIMBO file.

  o  Disk corruption can occur when heavy open/read/write/close/delete
     operations are occurring.

  o  At some point after a node CLUEXITs, 2 or more cluster nodes
     crash with LOCKMGRERR Bugchecks.

  o  When two or more VAX or Alpha nodes boot at the same time, one
     or more of them may crash.


Problems Addressed in the VAXSYS16_U2055, VAXSYS15_U2055,
  and VAXSYS14_U2055 Kits:

  o  Two new fields were added to the IRP data structure for shadow
     write logging information. This new IRP definition size
     conflicts with the IRP sizes of other images on the system.
     This conflict may cause a variety of errors, including fatal
     bugchecks.  This fix changes the IRP definitions back to the
     SBB versions.

     This problem is corrected in OpenVMS VAX V6.2.


Problems Addressed in the VAXSYS13_U2055 Kit:

NOTE:  According to OpenVMS Engineering, the fixes contained
       in VAXSYS13_U2055 have been included in OpenVMS VAX V7.0.

  o  System crashes may occur due to corrupted PTE entries.  The
     corruption appears to be related to Global Section Table Entries
     pointing to Global Section Descriptors.

     The problem occurs only if 4095 GBLSECTIONS is exceeded.  To
     check the number of Global Sections currently in use add the
     following values:

          o  SDA> VALIDATE QUEUE EXE$GL_GSDSYSFL !global sections

          o  SDA> VALIDATE QUEUE EXE$GL_GSDDELFL !delete pending  global
                                                  sections

          o  SDA> VALIDATE QUEUE EXE$GL_GSDGRPFL !group global sections

Problem addressed in the VAXSYS12_U2055 kit:

  o  Due to an inadequate synchronization mechanism, the MONITOR DISK
     command can go into an infinite loop on multi-processing
     machines.


Problem addressed in the VAXSYS11_U2055 kit:

  o  The system crashes with a PGFIPLHI bugcheck and the message
     "Pagefault at IPL too high".  The VA is pointing to a
     CCB (Channel Control Block) and the PC is located within
     the MBDRIVER module.


Problem Addressed in the VAXSYS10_U2055 Kit:

  o  Performance may be degraded due to excessive kernel mode time
     being spent in MMG$FREWSLE attempting to find a working set
     page to replace.


Problems Addressed in the VAXSYS09_U2055 Kit:

  o  In a small working set, it is possible for the EXE$PSCAN_NEXT_PID
     routine (which is called by $GETJPI) to take a page fault at IPL 8.
     This causes a PGFIPLHI bugcheck.  The page referenced is in the
     PROCESS_SCAN context block (PSCANCTX$ data structure) in process
     virtual address space.

  o  The $SETIMR and $SCHDWK system services which request timer
     interrupts may cause a system to hang.  This occurs when a time
     already passed is specified for a wake to occur.


Problems Addressed in the PRCMGT$01_U2055 Kit:

  o  A system crash may occur at POSIX$KERNEL+3B371 with POSIX$DCL as
     the current image.  The crash is provoked when a user logs in
     with /CLI=POSIX$CLI.  The DCL command may cause the system to
     crash, or the process to evaporate.  Occasionally, the crash will
     occur following a few carriage returns.

     This problem is corrected in OpenVMS VAX V6.0.

  o  Fixes for various problems in $GETJPI (ECO 15):

     The following problems have been reported in the $GETJPI and
     $GETJPIW system services (executive routine EXE$GETJPI):

     ·  Process hangs while waiting for $GETJPI to complete

        A process might wait forever in LEF state while attempting to
        retrieve an item which required access to another process's P1
        space.  While this kit includes changes which fix some
        instances of this problem, there is the possibility it may
        still occur.  Should the problem persist after installing this
        kit, one may work around the hang by revising the application
        and adding a timer request AST and recovery routine.  For more
        information, refer to an article in the OPENVMS database using
        a search string of:

             Application and $GETJPIW and $GETJPI and Hang

     ·  SSRVEXCEPT bugchecks

        There were several instances where EXE$GETJPI would try to
        access data structures formerly assigned to a now deleted
        process.  Most frequently, the problem showed up as an access
        violation at EXE$GETJPI+712 while trying to retrieve the
        external PID.

     ·  PGFIPLHI bugchecks

        This involved another instance of access to the former data
        structures of a deleted process.   In this case though,
        EXE$GETJPI attempted recovery.  The recovery was incorrect and
        would lead to unreleased spinlocks, high IPL access to paged
        code and other problems.

     ·  KRPEMPTY bugchecks

        This was yet another instance of access to a deleted process.
        If the process was selected by a "wildcard" PID, EXE$GETJPI
        would attempt to allocate an entry from the KRP lookaside list
        without having released a previous entry.

     ·  Stack corruption

        The kernel stack could be corrupted if the target process of a
        $GETJPI request was out of AST quota.

     ·  Incorrect AST quota

        AST quota could be gained or lost on an SMP system because of
        access via non-interlocked instructions.

     ·  Final status of 0 in R0

        A user could get a final status of 0 in R0 if the PHD of a
        target process was swapped out.

     These problems are corrected in OpenVMS VAX V6.0.

Problem addressed in the VAXSYS01_2H4055 kit:

  o  VAX 4000 Model 100A, 500A, 600A and 700A will no longer be able
     to boot via the Q-bus after installation of DECnet/OSI V5.5 or
     V5.6. These versions of DECnet/OSI eliminate code for support of
     new hardware in OpenVMS VAX V5.5-2H4.


Problems Address in the VAXSYSL04_U2055 Kit:

  o  The PE1 parameter which was previously used to control the size
     of trees to be remastered has been changed.  If a negative
     value is placed in the parameter, the RRSCAN routine will exit
     without doing any scans or remastering.

  o  When the system is scanning for trees to remaster due to a
     change in cluster membership, RM_QUOTA may be exhausted.  When
     this occurs, possible RSB queue corruption may result.

  o  The system crashes with a LKBREFNEG bugcheck when a parent sub
     lock count exceeds 32K on a $DEQ.

  o  The system crashes with a RSBREFNEG bugcheck when a parent sub
     resource count exceeds 32K on a $DEQ.

  o  During dynamic remastering, performance is degraded when large
     lock trees are moved.

  o  The LKID_MSK routine which is used to mask off the LKID (Lock
     ID) from the SEQN is incorrectly generated in DSTRLOCK.  This
     can cause the LKID Validation Routines to incorrectly indicate
     that a LKID is invalid.

  o  Locks are sometimes granted out of order during remastering of
     a resource.

  o  The "Recover" privilege is not being correctly checked.  This
     prevents recovery processing from recovering databases after
     node failures.

  o  When the resource for a two phase conversion in progress is
     canceled, a fatal bugcheck will occur if the resource's BLOCKAST
     count is invalid.

  o  The activity scan rate of the Lock Manager has been changed
     from 1 second to 8 seconds to reduce Lock Manager overhead and
     make the tree moving algorithms more conservative.


Problems Addressed in the VAXMONT01_061 kit:

  o  When the 'MONITOR DISK' command is issued on a system with DFS
     devices mounted, only the first three characters of the DFS
     disk name are displayed correctly.  The last character is
     often displayed as a non-printable character or as an escape
     sequence.  This may cause terminal lock-ups, resetting of
     terminal characteristics or other unexpected terminal side effects.

  o  The 'MONITOR DISK' command may appear to hang when monitoring
     a system with more than 800 disks.  An error occurs, but the
     error status is not displayed.  The hang may also occur when a
     MONITOR CLUSTER command is issued.

  o  Due to an inadequate synchronization mechanism, the 'MONITOR
     DISK' command can go into an infinite loop on multi-processor
     machines.

  o  Use of the 'MONITOR PROCESS' command in a local environment will
     fail if the SYSGEN parameter MAXPROCESSCNT is set to allow more
     than 1040 processes.  When Virtual Balance Slots were added in
     OpenVMS V6.0, this number dropped to 978.

  o  In a mixed version OpenVMScluster, the following MONITOR
     command will crash the target V6.0 node if it is issued
     from a V5.5-2 node:

          $MONITOR STATES,POOL,DECNET,LOCK /NODE=V6.0_node


Problem Addressed in the VAXMONT03_U2055 Kit:

  o  The image to correct the MAXPROCESSCNT problem should have
     been included in the VAXMON02_U2055 kit.  It was not.


Problems Addressed in the VAXMONT02_U2055 Kit:

  o  An error occurs following the use of the following MONITOR
     command:

          $ MONITOR [CLASS] /NODE={nodelist}

     The error indicates that the connection to a remote node
     has been lost and the collection activity terminates for
     that node.

  o  The MONITOR process class will not function if the SYSGEN
     parameter MAXPROCESSCNT is larger than 1040.  The following
     errors will be returned:

          %MONITOR-E-COLLERR, error during data collection
          -SYSTEM-F-BADPARAM, bad parameter value


Problem Addressed in the VAXMOUN05_U2055 Kit:

  o  A delay of up to six minutes can occur before a
     device-not-ready condition is reported during cartridge volume
     switching on non-SCSI (Small Computer System Interface)
     TX867-type devices.


Problems Addressed in the VAXMOUN04_U2055 Kit:

  o  RE-INITIALIZATION errors are reported to users of SCSI
     tape drives attached to an HSx controller.  This occurs
     if multiple SCSI tapes are attached to the HSx and all the
     tapes are at or near PEOT and the connection to the HSx is
     broken.

  o  A tape drive will sometimes fail over to another HSx
     controller after the tape is dismounted.

  o  Numbers greater than 9999 which are randomly generated
     by HSx devices may cause the system to crash.

  o  Packet Acknowledgements (PACKACK) issued on client nodes
     that are using a specified preferred path will fail if
     the specified path is not the current primary path and
     the path cannot be changed because the disk in online
     through another path.

  o  In Controller Based Shadowing, mounting a disk named
     DUx or a tape named MUx causes the following error
     message to appear:

     %MOUNT-W-CBSNOTSUPTD, Attention - Phase I Shadowing is not supported
                           as of OpenVMS VAX V6.1
     %MOUNT-I-MOUNTED, SCRTCH mounted on _$5$MUA0: (MOOSHEAD)

     This error message should only appear when an attempt is
     made to mount a DUS device.

  o  A user is unable to read the second volume of backup tapes
     written under OpenVMS V5.3.  However, the tapes can be
     read successfully on OpenVMS VAX V5.5-1.

  o  If a logical is specified on a MOUNT shadow set command line
     and this logical has the same name as one of the shadow set
     members, then the following command sequence will fail with
     an INCONSDEV mount error which will cause a system crash:

          $ MOUNT/SYSTEM DSA0/SHADOW=$1$DIA0: TWI_TEST $1$DIA0
            %MOUNT-I-MOUNTED, TWI_TEST     mounted on _DSA0:
            %MOUNT-I-SHDWMEMSUCC, _$1$DIA0: (SPRING) is now a valid
                     member of the shadow set
          $ MOUNT/SYSTEM DSA0/SHADOW=$1$DIA1: TWI_TEST
            %MOUNT-F-INCONSDEV, inconsistent device types

  o  If no operator is present to respond, MOUNT within a subprocess
     will fail with the Following message:

          %MOUNT-F-BATCHNOOPR, No operator available to service batch
                               request

  o  MOUNT causes an implicit allocation of a device (i.e., a channel
     is opened to the device) to a child process to change the ownership
     of the device to the parent process on a dismount.   A subsequent
     mount of the device by the child process will fail because the device
     is now allocated to the parent.

  o  The new message "Another Volume Set of the Same Label is
     Already Mounted" has been added.

  o  If a tape device does not support compaction, then the
     MOUNT/FOREIGN/NOCACHE command mounts the device with
     CACHE ENABLED.

  o  MOUNT only waits 10 seconds to allow SCSI magtape
     devices to become ready before determining that the
     device is off line.  Tx8x7 tape devices may take
     up to 6 minutes to become ready during a volume
     switch.

  o  MOUNT is unable to skip a number of records greater
     than 8000 hexadecimal when it tries to reposition
     tapes after a label verification in mount verify.

  o  A tape initialized with the following command will not be
     mounted if the user is not the owner, even if all privileges
     are enabled (i.e., user is SYSTEM):

          $ INITIALIZE/LABEL=(VOLUME_ACCESSIBILITY:"%")/OWNER=[100,100] -
            /PROTECTION=(S:RWED,O:RWED,G,W)  
Files on this server are as follows:
»vaxshad09_u2055.README
»vaxshad09_u2055.CHKSUM
»vaxshad09_u2055.CVRLET_TXT
»vaxshad09_u2055.a-dcx_vaxexe
»vaxshad09_u2055.CVRLET_TXT
privacy statement using this site means you accept its terms