5    Rolling Upgrade

TruCluster Server Version 5.0A and higher provides the infrastructure that makes a rolling upgrade possible.

For more detailed information about using the rolling upgrade process to install a new operating system or TruCluster software version, see the Version 5.1 or higher Cluster Installation manual.

Note

If you have not yet created your cluster, Compaq recommends that you patch your system first. See Section 3.3 for this time-saving procedure.

This chapter provides the following information:

5.1    Overview

A rolling upgrade is a software upgrade of a cluster that is performed while the cluster is in operation. One member at a time is rolled and returned to operation while the cluster transparently maintains a mixed-version environment for the base operating system, cluster, and Worldwide Language Support (WSL) software. Clients accessing services are not aware that a rolling upgrade is in progress.

When performing a rolling upgrade, the same procedure is used for patching your system as for upgrading to a new operating system or TruCluster version. The only difference is that for a rolling patch you use the dupatch utility and for a rolling upgrade you use the installupdate utility during the install stage.

Note

See Chapter 2 for an overview of the dupatch utility and Chapter 4 for step-by-step instructions for using dupatch.

A roll consists of a series of stages (described in Section 5.2) that must be performed in a fixed order. When patching a cluster, the commands that control a rolling upgrade to enforce this order are clu_upgrade and dupatch.

You can perform only one rolling upgrade at a time. You cannot start another roll until the first roll is completed.

Note

A rolling upgrade updates the file systems and disks that the cluster currently uses; it does not update the disk or disks that contain the Tru64 UNIX operating system that were used to create the cluster (the operating system on which you ran clu_create). Although you can boot the original operating system in an emergency, remember that the differences between the current cluster and the original operating system increase with each roll.

5.1.1    Tagged Files

A rolling upgrade updates the software on one cluster member at a time so that you can test the new software without disrupting critical services. In order to support two versions of software in the cluster during a roll, clu_upgrade creates a set of tagged files in the setup stage.

These tagged files are copies of current files with .Old.. prepended to the file name. For example, the tagged file for the vdump command is /sbin/.Old..vdump. Tagged files are created in the same file system as the original files.

Each tagged file has an AdvFS property, DEC_VERSION_TAG, set on it. If a member's sysconfigtab rolls_ver_lookup attribute is set to 1, pathname resolution includes determining whether a specified filename has a .Old..filename copy and whether the copy has the DEC_VERSION_TAG property set on it. If both conditions are met, the requested file operation is transparently diverted to use the .Old..filename version of the file.

Note that file system operations on directories are not bound by this .Old.. restraint. For example, you will see both versions of a file listed when you issue the ls command of a directory on any cluster member during a rolling upgrade.

The upgrade commands control when a member runs on tagged files by setting that member's sysconfigtab rolls_ver_lookup variable. The commands set the value to 1 when the member must run on tagged files, and to 0 when the member must not run on tagged files. The only member that never runs on tagged files is the lead member (the first member to roll).

The following rules determine which files have tagged files automatically created for them in the setup stage:

The clu_upgrade command provides several command options to manipulate tagged files: check, add, remove, enable, and disable. When dealing with tagged files, take the following into consideration:

5.1.2    Version Switch

A version switch manages the transition of the active version to the new version of an operating system. The active version is the one that is currently in use. The purpose of a version switch in a cluster is to prevent the introduction of potentially incompatible new features until all members have been updated.

For example, if a new version introduces a change to a kernel structure that is incompatible with the current structure, you do not want cluster members to use that new feature until all members have updated to the version that supports the new features.

At the start of a rolling upgrade, all members' active versions are the same as their new versions. During a roll, each member's new version is updated when it rolls. After all members have rolled, the switch stage sets the active version to the new version on all members. At the completion of the upgrade, all members' active versions are once again the same as their new versions.

The clu_upgrade command uses the versw command (described in versw(8)) to manage version transitions. The clu_upgrade command manages all the version switch activity when rolling individual members. In the switch stage, after all members have rolled, run the clu_upgrade switch command to complete the transition to the new software.

5.2    Rolling Upgrade Stages

This section takes a closer look at each of the rolling upgrade stages. Figure 5-1 provides a flow chart of the tasks and stages that are required to perform a rolling upgrade. (See Section 5.3 for the rolling upgrade procedure.)

Figure 5-1:  Rolling Upgrade Flow Chart

The stages are performed in the following order:

  1. Preparation stage (Section 5.2.1)

  2. Setup stage (Section 5.2.2)

  3. Preinstall stage (Section 5.2.3)

  4. Install stage (Section 5.2.4)

  5. Postinstallation stage (Section 5.2.5)

  6. Roll stage (Section 5.2.6)

  7. Switch stage (Section 5.2.7)

  8. Clean stage (Section 5.2.8)

5.2.1    Preparation Stage

Command Where Run Run Level
clu_upgrade -v check setup lead_memberid any member multiuser mode

During the preparation stage, you back up all important cluster data and verify that the cluster is ready for a roll. Before beginning a rolling upgrade, do the following:

  1. Back up the clusterwide root (/), /usr, and /var file systems. The backups should include all member-specific files in these file systems. If the cluster has a separate i18n file system, back up that file system. In addition, back up any other file systems that contain critical user or application data.

    Note

    If you perform an incremental or full backup of the cluster during a rolling upgrade, make sure to perform the backup on a member that is not running on tagged files. If you back up from a member that is using tagged files, you will back up the contents of the .Old.. files. Because the lead member never uses tagged files, you can back up the cluster from the lead member (or any other member that has rolled) during a rolling upgrade.

    Most sites have automated backup procedures. If you know that an automatic backup will take place while the cluster is in the middle of a rolling upgrade, make sure that backups are done on the lead member or on a member that has rolled.

  2. Choose one member of the cluster as the first member to roll. This member, known as the lead member, must have direct access to the root (/), /usr, /var, and if used, i18n file systems.

    Make sure that the lead member can run any critical applications. You can test these applications after you update this member during the install stage, but before you roll any other members. If there is a problem, you can try to resolve it on this member before you continue. If there is a problem that you cannot resolve, you can undo the rolling upgrade and return the cluster to its pre-roll state. (Section 5.5 describes how to undo rolling upgrade stages.)

  3. Run the clu_upgrade -v check setup lead_memberid command, which verifies that:

    Note

    The clu_upgrade -v check setup lead_memberid command may check some — but not all — file systems for adequate space. Make sure that you manually check that your system meets the disk space requirements described later in this section.

A cluster can continue to operate during a rolling upgrade or a patch because there are two copies of almost every file. (There is only one copy of some configuration files so that changes made by any member are visible to all members.) This approach makes it possible to run two different versions of the base operating system and the cluster software at the same time in the same cluster. The trade-off is that, before you start an upgrade or patch, you must make sure that there is adequate free space in each of the clusterwide root (/), /usr, and /var file systems, and if there is a separate domain for the Worldwide Language Support (WLS) subsets, i18n file systems.

A rolling upgrade has the following disk space requirements:

If a file system needs more free space, use AdvFS utilities such as addvol to add volumes to domains as needed. For information on managing AdvFS domains, see the AdvFS Administration manual. Note that you can expand the clusterwide root (/) domain.

5.2.2    Setup Stage

Command Where Run Run Level
clu_upgrade setup lead_memberid any member multiuser mode

The clu_upgrade setup lead_memberid command performs the following tasks:

Caution

Make sure your system meets the space requirements described in Section 5.2.1 before issuing the clu_upgrade setup command.

5.2.3    Preinstall Stage

Command Where Run Run Level
clu_upgrade preinstall lead member multiuser mode

The purpose of the preinstall stage is to verify that the cluster is ready for the lead member to run the installupdate or dupatch commands and, if the upgrade includes update installation, to copy the new TruCluster Server kit so that the kit will be available during the install stage. If you will perform an update installation when you perform the step-by-step upgrade procedure in Section 5.3, remember to mount the new TruCluster Server kit before you run the preinstall command.

The clu_upgrade preinstall command performs the following tasks:

5.2.4    Install Stage

Command Where Run Run Level
installupdate lead member single-user mode
dupatch lead member single-user mode or multiuser

The install stage starts when the clu_upgrade preinstall command completes, and continues until you run the clu_upgrade postinstall command.

The lead member must be in single-user mode to run the installupdate command, and single-user mode is recommended for the dupatch command. When taking the system to single-user mode, you must halt the system and then boot it to single-user mode.

When the system is in single-user mode, run the bcheckrc and init -s commands before you run either the installupdate or dupatch command. See the Tru64 UNIX Installation Guide for information on how to use these commands.

In the install stage, you can perform one of the following:

Note

If you run clu_upgrade status after running installupdate, clu_upgrade will print a line indicating that the install stage is complete. However, the install stage is not complete until you run the clu_upgrade postinstall command.

5.2.5    Postinstallation Stage

Command Where Run Run Level
clu_upgrade postinstall lead member multiuser mode

The postinstallation stage verifies that the lead member has completed an update installation, a patch, or both. If an update installation was performed, clu_upgrade postinstall verifies that the lead member has rolled to the new version of the base operating system.

5.2.6    Roll Stage

Command Where Run Run Level
clu_upgrade roll member being rolled single-user mode

The lead member was upgraded in the install stage. The remaining members are upgraded one at a time in the roll stage.

The clu_upgrade roll command performs the following tasks:

5.2.7    Switch Stage

Command Where Run Run Level
clu_upgrade switch any member multiuser mode

The switch stage sets the active version of the software to the new version, which results in turning on any new features that had been deliberately disabled during the rolling upgrade.

The clu_upgrade switch command performs the following tasks:

Note

After the switch stage completes, you must reboot each member of the cluster, one at a time.

5.2.8    Clean Stage

Command Where Run Run Level
clu_upgrade clean any member multiuser mode

The clean stage cleans up the files and directories that were used for the rolling upgrade.

The clu_upgrade clean command performs the following tasks:

5.3    Rolling Upgrade Procedure

In the following procedure, unless otherwise stated, run commands in multiuser mode.

Note

If you have not yet created your cluster, it is recommended that you patch the operating system and TruCluster software before performing a rolling upgrade. See Section 3.3 for information.

Note

During a rolling upgrade, do not use the /usr/sbin/setld command to add or delete any of the following subsets:

Adding or deleting these subsets during a rollng upgrade creates inconsistencies in the tagged files.

Some stages of a rolling upgrade take longer to complete than others. Table 5-1 lists the approximate time it takes to complete each stage.

Table 5-1:  Time Estimations for a Rolling Upgrade

Stage Duration
Preparation Not under program control.
Setup 45 - 120 minutes. [Footnote 1]
Preinstall 15 - 30 minutes. [Footnote 1]
Install The same as installing a patch kit on a single system. Approximately 35 minutes, depending upon the size of the patch kit.
Postinstall Less than 1 minute.
Roll (per member) Patch: less than 5 minutes. Update installation: about the same amount of time it takes to add a member.
Switch Less than 1 minute.
Clean 30 - 90 minutes. [Footnote 1]

  1. Prepare the cluster (see Section 5.2.1):

    1. Back up the cluster.

    2. Choose a cluster member to be the lead member (the first member to roll). The examples in this procedure use the member whose memberid is 2 as the lead member. The member's host name is provolone.

    3. Make sure that your system contains the required space in all file systems as described in Section 5.2.1. If a file system needs more free space, use AdvFS utilities such as addvol to add volumes to domains as needed. For information on managing AdvFS domains, see the AdvFS Administration manual. Note that the clu_upgrade -v check setup lead_memberid command may check some — but not all — file systems for adequate space. Make sure that you manually check that your system meets the disk space requirements described in Section 5.2.1.

    4. On any member, run the clu_upgrade -v check setup lead_memberid command to determine whether the cluster is ready for an upgrade. For example:

      clu_upgrade -v check setup 2
      

  2. Perform the setup stage (Section 5.2.2).

    On any member, run the clu_upgrade setup lead_memberid command. For example:

    clu_upgrade setup 2
    

    Caution

    If any file system fails to meet the minimum space requirements, the program will fail and generate an error message similar to the following:

    *** Error ***
    The tar commands used to create tagged files in the '/' file system have
    reported the following errors and warnings:
    NOTE: CFS: File system full: /
     
            tar: sbin/lsm.d/raid5/volsd : No space left on device
            tar: sbin/lsm.d/raid5/volume : No space left on device
    NOTE: CFS: File system full: /
     
    .NOTE: CFS: File system full: /
     
     
    

    If you receive this message, run the clu_upgrade -undo setup command, free up the required amount of space on the affected file systems, and then rerun the clu_upgrade setup command.

    During the setup stage, clu_upgrade asks whether you are performing a update installation or a patch. However, the wording of the prompts in the Version 5.0A command is somewhat ambiguous:

    Are you running the clu_upgrade command to upgrade to a new version of
    the base operating system and cluster software? [yes]:
     
    Are you running the clu_upgrade command in order to apply a rolling
    patch? [yes]
    

    The clu_upgrade command does not display the second prompt until it receives an answer for the first. An administrator might be tempted to answer yes to the ... upgrade to a new version ... prompt when performing a rolling upgrade to patch the cluster because a patch is an upgrade to new software. However, if you see these prompts, answer yes to the first prompt only if you plan to run installupdate during the install stage.

    Note: No WLS and Disk Space

    Additional space is required in the cluster_root domain for backing up member files on clusters without Worldwide Language Support (WLS). If no space is available, the following message is displayed:

    *** Error ***
    There is no space available in the root (/), /usr, or /var
    file systems to back up member ''???'' member-specific files.
    Increase the available disk space on one of these file systems
    and rerun this stage of the upgrade.
    

    The minimum required available space in the cluster_root domain must be greater than the sum of all of the member directories in the root (/), /usr, or /var file systems.

    To view the available space in the cluster_root domain, enter the following command:

    df /
    

    For example:

    df /
    Filesystem       512-blocks   Used  Available  Capacity  Mounted on
    cluster_root#root    524288 175710     330512     35%    /
    

    To calculate the minimum required value, enter the following command:

    ksh 'du  -s {,/usr,/var}/cluster/members/member?*/' | \
      awk '{minimum+=$1}; END{print minimum}'
    

    For example:

    ksh 'du  -s {,/usr,/var}/cluster/members/member?*/' | \
    > awk '{minimum+=$1}; END{print minimum}'
    679030
    

    The example indicates that cluster_root domain needs 348518 more blocks (679030 minus 330512), or approximately 175 MB of disk space. Use the addvol command to add additional volumes to the cluster_root domain.

  3. When asked if you want to continue the cluster upgrade, accept the default of yes:

     This is the cluster upgrade program.
     You have indicated that you want to perform the 'setup' stage of the
     upgrade.
     
     Do you want to continue to upgrade the cluster? [yes]: [Return]
     
     Are you running the clu_upgrade command to upgrade to a new version of
     the base operating system and cluster software? [yes]: no
     
     Are you running the clu_upgrade command to apply a rolling patch? [yes]: [Return]
    

    Note that these prompts will change if you run the upgrade to its conclusion and then rerun it to remove patches. See Section 5.6 for more information (including the prompts you will see).

  4. One at a time, reboot all cluster members except the lead member.

  5. Perform the preinstall stage (Section 5.2.3).

    Note

    If you plan to run installupdate in the install stage, mount the device or directory that contains the new TruCluster Server kit before running clu_upgrade preinstall. The preinstall command will copy the kit to the /var/adm/update/TruClusterKit directory.

    On the lead member, run the following command:

    clu_upgrade preinstall
    

  6. Manually relocate CAA services from the lead member to another cluster member before performing the install stage. For example:

    /usr/sbin/caa_relocate -s lead_member -c non_lead_member
    

  7. Perform the install stage (Section 5.2.4).

    Caution

    If you encounter unrecoverable failures while running dupatch, do not run the clu_upgrade -undo install command.

    Contact your support personnel for further instructions.

    You can patch a cluster or update cluster and operating system software.

    You can perform a rolling upgrade to patch a cluster in either single-user mode, which is recommended, or in multiuser mode:

    See Chapter 4 for information about using the dupatch utility.

    1. Run the lmf reset command:

      lmf reset
      

    If you are performing a roll that includes both an upgrade and a patch, do the update installation first and then the patch installation.

    After the lead member performs its final reboot with its new custom kernel, perform the following manual tests before you roll any additional members:

    Verify that the newly rolled lead member can serve the shared root (/) file system.

    1. Use the cfsmgr command to determine which cluster member is currently serving the root file system. For example:

      cfsmgr -v -a server /
       
       Domain or filesystem name = /
       Server Name = polishham
       Server Status : OK
      

    2. Relocate the root (/) file system to the lead member. For example:

      cfsmgr -h polishham -r -a SERVER=provolone /
      

    Verify that the lead member can serve applications to clients. Make sure that the lead member can serve all important applications that the cluster makes available to its clients.

    You decide how and what to test. Thoroughly exercise all critical applications and satisfy yourself that the lead member can serve these applications to clients before continuing the roll. For example, you can:

    1. Manually relocate CAA services to the lead member. For example, to relocate an application resource named clock to lead member provolone:

      caa_relocate clock -c provolone
      

    2. Temporarily modify the default cluster alias attributes for the lead member so that it handles routing for the alias and serves all client requests that are directed to the alias. For example:

      cluamgr -a alias=DEFAULTALIAS,rpri=100,selp=100cluamgr -r start
      

      The lead member is now handling all traffic that is addressed to the default cluster alias. (You can use the arp -a command to verify that the lead member has the permanent published entry for the default cluster alias.)

      From another member or from an outside client, use services such as telnet and ftp to verify that the lead member can handle alias traffic. Test client access to all important services that the cluster provides. When you are satisfied, reset the alias attributes on the lead member to their original values.

  8. Perform the postinstallation stage (Section 5.2.5).

    On the lead member, run:

    clu_upgrade postinstall
    

  9. Perform the roll stage (Section 5.2.6).

    One at a time, on each member of the cluster that has not rolled, do the following:

    1. Manually relocate CAA services from the member to another cluster member before performing the roll stage. For example:

      /usr/sbin/caa_relocate -s member_to_roll \
        -c another_member
      

    2. Take the member to single-user mode by first halting the member and then booting to single-user mode. Before halting the member, make sure that the cluster can maintain quorum without the member's vote. For information about maintaining quorum when shutting down a member, see the chapter on Managing Cluster Members in the Version 5.1A Cluster Administration manual.

      /sbin/shutdown -h now
      

      Note

      Halting and booting the system ensures that it provides the minimal set of services to the cluster and that the running cluster has a minimal reliance on the member running in single-user mode. In particular, halting the member satisfies services that require the cluster member to have a status of DOWN before completing a service failover. If you do not first halt the cluster member, there is a high probability that services will not fail over as expected.

    3. Boot the member:

      >>> boot -fl s
      
      

    4. When the system reaches single-user mode, run the init s, bcheckrc, kloadsrv, and lmf reset commands. For example:

      /sbin/init s/sbin/bcheckrc/sbin/kloadsrv/usr/sbin/lmf reset
      

    5. Roll the member:

      clu_upgrade roll
      

      When the member boots its new kernel, it has completed its roll and is no longer running on tagged files. Continue to roll members until all members of the cluster have rolled.

      Note: /var Disk Space

      The following messages might be displayed while running the clu_upgrade roll command:

      Backing up member-specific data for member: n
       ...NOTE: CFS: File system full: /var
       
        tar: /dev/tty Unavailable
       
      *** Error ***
      An error was detected while backing up member 'n' \
      member-specific files.
      

      Additional space in the cluster_var domain is required. To view the available space in the cluster_var domain, enter the following command:

      df /var
      

      To calculate the required value, enter the following command:

      ksh 'du  -s {,/usr,/var}/cluster/members/member?*/' | \
        awk '{minimum+=$1}; END{print minimum}'
      

      Use the addvol command to add additional volumes to the cluster_var domain.

  10. Perform the switch stage (Section 5.2.7).

    After all members have rolled, run the following command on any member to enable any new software features that were deliberately disabled until all members have rolled:

    clu_upgrade switch
    

  11. One at a time, reboot each member of the cluster.

  12. Perform the clean stage (Section 5.2.8).

    Run the following command on any member to remove the tagged (.Old..) files from the cluster and complete the upgrade.

    clu_upgrade clean
    

5.4    Displaying the Status of a Rolling Upgrade

The clu_upgrade command provides the following options for displaying the status of a rolling upgrade or patch. You can run status commands at any time.

Note

During a roll, there might be two versions of the clu_upgrade in the cluster — an older version used by members that have not yet rolled, and a newer version (if included in the update distribution or patch kit). When checking status, the information that is displayed by the status command might differ depending on whether the command is run on a member that has rolled. Therefore, if you run the status command on two members, do not be surprised if the format and content of the displayed output are not the same.

5.5    Undoing a Stage

The clu_upgrade undo command provides the ability to undo a rolling upgrade that has not completed the switch stage. You can undo any stage except the switch stage and the clean stage.

Note

See Section 5.6 for information about deleting patches installed during a rolling upgrade.

To undo a stage, use the undo command with the stage that you want to undo. The clu_upgrade command determines whether the specified stage is a valid stage to undo. Table 5-2 outlines the requirements for undoing a stage:

Caution

If you encounter unrecoverable failures while running dupatch, do not run the clu_upgrade -undo install command.

Contact your support personnel for further instructions.

Table 5-2:  Undoing a Stage

Stage to Undo Command Comments
Setup clu_upgrade undo setup

You must run this command on the lead member. In addition, no members can be running on tagged files when you undo the setup stage.

Before you undo the setup stage, use the clu_upgrade -v status command to determine which members are running on tagged files. Then use the clu_upgrade tagged disable memberid command to disable tagged files on those members.

When no members are running on tagged files, run the clu_upgrade undo setup command on the lead member.

Preinstall clu_upgrade undo preinstall You must run this command on the lead member.
Install clu_upgrade undo install You can run this command on any member except the lead member. Halt the lead member. Then run the clu_upgrade undo install command on any member that has access to the halted lead member's boot disk. When the command completes, boot the lead member.
Postinstall clu_upgrade undo postinstall You must run this command on the lead member.
Roll clu_upgrade undo roll memberid You can run this command on any member except the member whose roll is being undone. Halt the member whose roll stage is being undone. Then run the clu_upgrade undo roll memberid command on any other member that has access to the halted member's boot disk. When the command completes, boot the halted member. The member will now be using tagged files.

Note

You might see the following error message when running the clu_upgrade undo postinstall command:

*** Error ***
The 'undo' option cannot be run at the 'postinstall' stage,
either because the next stage has already been started or
because the stage specified for undo has not been started.

If you see the message, remove the following file before running the clu_upgrade undo postinstall command:

rm /cluster/admin/clu_upgrade/roll.started

5.6    Removing Patches Installed During a Rolling Upgrade

The following sections describe how to remove or reinstall patches during a rolling upgrade.

5.6.1    Steps Prior to the Switch Stage

At any time prior to issuing the clu_upgrade switch command, you can remove some or all of the patches you installed during the rolling upgrade by returning to the install stage, rerunning dupatch, and selecting the Patch Deletion item in the Main Menu. See Section 4.12 for information about removing patches with dupatch.

You can also reinstall some or all of the patches you removed by rerunning dupatch. (See Section 5.5 for information about undoing any of the rolling upgrade stages.)

5.6.2    Steps for After the Switch Stage

To remove patches after you have issued the clu_upgrade switch command, you will have to complete the current rolling upgrade procedure and then rerun the procedure from the beginning (starting with the setup stage).

When you run the install stage, you must bring down your system to single-user mode as described in steps 1 through 6 of Section 4.8.1.1. When you rerun dupatch (step 7), select the Patch Deletion item in the Main Menu. See Section 4.12 for information about removing patches with dupatch.

If the patch uses the version switch, you can still remove the patch, even after you have issued the clu_upgrade switch command. Do this as follows:

  1. Complete the current rolling upgrade procedure.

  2. Undo the patch that uses the version switch by following the instructions in the release note for that patch. Note that the last step to undo the patch will require a shutdown of the entire cluster.

  3. Rerun the rolling upgrade procedure from the beginning (starting with the setup stage). When you rerun dupatch, select the Patch Deletion item in the Main Menu.

To learn which patches use the version switch, use the following command:

# grep -l PATCH_REQUIRES_VERSION_SWITCH=\
"Y\" /usr/.smdb./*PAT*.ctrl

For information about version switches, see Section 5.1.2.

Note

If you rerun the rolling upgrade procedure to remove patches, the prompts you receive during the setup stage will be different from those issued during the initial rolling upgrade. Those prompts will look as follows:

Do you want to continue to upgrade the cluster? [yes]: [Return]

What type of upgrade will be performed?
 
1) Rolling upgrade using the installupdate command
2) Rolling patch using the dupatch command
3) Both a rolling upgrade and a rolling patch
4) Exit cluster software upgrade
 
Enter your choice: 2
 
 

The sample installation in Section B.2 shows the prompts you will see during the initial rolling upgrade.