September 2004
Copyright © 2004 Sun Microsystems, Inc. All rights reserved.

NetraTM High Availability (HA) Suite Foundation Services 2.1 6/03 Release Notes


Contents

    Contents

    Introduction

    The nhinstall Tool Template Files

    Supported Software Versions

    Product Notes

    Product Restrictions

    Documentation Errata and Addenda


Introduction

These Release Notes contain important product notes and known restrictions in the Netra HA Suite Foundation Services 2.1 6/03. Details of workarounds to known bugs are provided where possible. In cases where there are differences between these Release Notes and the Netra HA Suite Foundation Services 2.1 6/03 documentation set, the information in these Release Notes takes precedence. In the rest of this document, the product is referred to as the Foundation Services.

For information about the supported hardware, see the Netra High Availability Suite Foundation Services 2.1 6/03 Hardware Guide. For information about the packages and patches delivered in the software delivery, see the Netra High Availability Suite Foundation Services 2.1 6/03 README.

If you are planning to upgrade your cluster from Foundation Services 2.1 to Foundation Services 2.1 6/03, install the documentation packages as described in the Netra High Availability Suite Foundation Services 2.1 6/03 README. After you have installed the documentation, see "Upgrading the Cluster" in the Netra High Availability Suite Foundation Services 2.1 6/03 Custom Installation Guide.



The nhinstall Tool Template Files

This section describes the template files required by the nhinstall tool.

The nhinstall tool is delivered with a set of templates for the addon.conf configuration file. The addon.conf file defines additional packages and patches that are not delivered with the Foundation Services software delivery. For information about how to use the templates to create the addon.conf file for your cluster, see the addon.conf(4) man page.

The following platform-specific addon.conf template files contain additional packages and patches for each hardware platform:


Supported Software Versions

This section lists the software you can use with the Foundation Services and specifies the versions of this software for different types of hardware.

The software supported with the Foundation Services 6/03 is as follows:

Foundation Services Patches

Since June 2003 (6/03), three patches have been delivered containing enhancements of and bug fixes for the Foundation Services. Install these patches. The names and locations of these patches are as follows:


Patch name
Download Location
114175
http://sunsolve.sun.com/point
115606 http://sunsolve.france.sun.com/cgi/show.pl?target=home
115644 http://sunsolve.france.sun.com/cgi/show.pl?target=home

Patch 114175 has a dependency on the SNDR patch 116710. For installation information specific to these Foundation Services patches, see the README provided with each patch.

SNDR Patch

The Netra HA Suite download contains three SNDR patches: 113054-04/, 113055-01/, 113057-03/. Do not install these SNDR patches if you have installed version -02 or later of the Foundation Services patches 114175, 115606, and 115644. Instead, install patch 116710 which replaces these three SNDR patches. This SNDR patch is available on Sun Solve at http://sunsolve.sun.com/point.

CGTP Patch

The Netra HA Suite download contains a Solaris patch for CGTP, T112281-02.

The Solaris patch for CGTP that you install depends on the version of Solaris you are installing on the cluster. Use the following table to choose the correct Solaris patch for CGTP to install on your cluster:

Solaris Version

Solaris Patch for CGTP


Location of Patch

Solaris 8 2/02

T112281-02


Part of Foundation Services distribution

Solaris 8 2/02 and kernel patch 108525-21

T116036-02



Solaris 8 PSR3

T116036-02



    Solaris 9
       112902-06
       112904-01
       112917-01
       112918-01
       112919-01
These patches are obsoleted by 112233-01 or later

release 9/02 or later of the Solaris operating system
        no CGTP patch required
                   N/A

Software Platform Version

Depending on the hardware you are using, you might require a specific level of software platform. For Netra CT 810 and Netra CT 410, you require at least RRL6.1. For Netra CT 820 boards, you require at least DVD0-11 and for Netra CP2300 with the Rapid Development Kit chassis, you require at least DVD0-10.

Version of Solaris Operating System for Different Hardware

The version of Solaris you install on a cluster depends on the hardware you are using.



Product Notes

Supported Cluster Size

The Foundation Services are supported on a cluster of up to 18 nodes.

SMCT

The SMCT tool is not supported at the current patch level of the Foundation Services.

CMM API Is a 64-bit Library

The CMM API is now a 64-bit library. The 64-bit library provides the same API as the 32-bit library used in the Foundation Services 2.1 release.

The modification has no impact on existing 32-bit applications. However, 32-bit applications compiled with the 64-bit library might trigger a compilation warning. It might be necessary to replace ulong types with uint32_t types.

Do Not Use The Reboot Command

When rebooting a master-eligible node on a running cluster, do not use the reboot command. Instead, use the init command as root user, as follows:

# init 6

Using the reboot command kills processes in an indeterminate order and therefore does not respect the required sequence for stopping services. This can lead to inconsistencies in data replication.

Do Not Schedule Major Tasks When the Cluster Is Unsynchronized

When a master-eligible node is reintegrated into the cluster (for example, after maintenance or failure), there is a period when disk partitions are resynchronizing. While a cluster is unsynchronized, the data on the master node disk is not fully backed up. It is unwise to schedule major tasks when the cluster is unsynchronized.


Product Restrictions

This section lists the known bugs and their workarounds where available. The bugs are described in the following subsections:

    Cluster Membership Manager (CMM) Bugs

    Reliable NFS Bugs

    Carrier Grade Transport Protocol (CGTP) Bugs

    Reliable Boot Service (RBS) Bugs

    Daemon Monitoring (nhpmd) Bugs

Cluster Membership Manager (CMM) Bugs

Bug 4697437: Notifications of Diskless Node State Transitions Can Be Lost
--------------------------------------------------------------------------------------------

Notifications that describe the difference between an initial state and a final state are emitted by the CMM on the master node when the cluster membership changes. The CMM running on a diskless node can miss notifications for transitory states. For example, when a cluster passes through three states (CC1, CC2, and CC3), a notification should be emitted to describe the transition from CC1 to CC2, and then to describe transition from CC2 to CC3. In this release of the product, a diskless node might only receive the notification for the overall transition from CC1 to CC3. The diskless node might miss the notification for the transient state CC2.

When a cluster passes from state CC1 to CC2, and then back to state CC1, the diskless node might not receive any notification.

Bug 4746183: Single Point of Failure Occurs Immediately After Switchover
-------------------------------------------------------------------------------------------

A single point of failure exists for a brief period of time after a switchover. The single point of failure lasts until the Reliable NFS receives the following notifications from the CMM, MASTER_ELECTED and VICE_MASTER_ELECTED.

You can use the nhcmmstat tool to check which notifications have been received.

If the newly elected master node reboots before the notifications are received, see the Netra High Availability Suite Foundation Services 2.1 6/03 Cluster Administration Guide for information about how to recover a cluster.

Bug 4740446: Switchover Is Initiated Even Though the CMM_FLAG_SYNCHRO_NEEDED Flag Is Set
---------------------------------------------------------------------------------------------------------------------------

There is a small time frame between the moment a command to change the synchronization state is issued at the API level and the moment when the nhcmmd daemon handles the command. If a switchover request is issued inside this time frame, the switchover request is accepted even if the cluster is no longer synchronized.

In this scenario, a call from Reliable NFS to clear the CMM_FLAG_SYNCHRO_NEEDED flag will fail because a switchover is in progress. Consequently, the master node will reboot and the replication will stop until the vice-master node is rebooted.

Verify that the CMM_FLAG_SYNCHRO_NEEDED flag is clear before requesting a switchover.

To recover from this problem, reboot the vice-master node.

Bug 4751051: Heartbeat of the Master Node Can Be Lost During Synchronization
---------------------------------------------------------------------------------------------------

During a full synchronization between the master node and the vice-master node, the following message might be displayed on the vice-master node console: master loss detected, but cannot switchover.

This message is generated because the network load prevents the vice-master node from detecting all of the master node heartbeats. The vice-master node can, therefore, conclude that the master node has failed.

Because the synchronization is in progress, the vice-master node cannot take the master role and there is no impact on the master node.

During periods when the vice-master node cannot detect the heartbeat of the master node; the synchronization is paused.

Bug 4749139: Library Clients Should Rely on Local Notifications Only
---------------------------------------------------------------------------------------

When a master-eligible node is elected as the vice-master node, the master node notifies the other peer nodes just before the data in the master node API module is updated.

As a consequence, the cmm_vicemaster_getinfo() function called on the master node can fail and return a CMM_ESRCH error, even though the CMM library clients on the other peer nodes have already received the CMM_VICEMASTER_ELECTED notification.

See the Netra High Availability Suite Foundation Services 2.1 6/03 CMM Programming Guide for more information.

Bug 4796226: cmm_mastership_release on Master Node Returns Incorrect Value if Vice-Master Out
--------------------------------------------------------------------------------------------------------------------------

When the cmm_mastership_release function is run on the master node, the function checks for the presence of the vice-master node. If the vice-master node is OUT_OF_CLUSTER, the function should return the CMM_ECANCELED value.

Instead, the function returns CMM_ETIMEDOUT value.

Bug 4854761: cmm_membership_remove on Master Node Causes Errors if Vice-Master Node Fails
-----------------------------------------------------------------------------------------------------------------------

If the vice-master node fails while the cmm_membership_remove function is running on the master node, the CMM_OK value is returned but the master node does not behave correctly.

The master node does the following:

Bug 4845598 Diskless Node Emits CMM_INVALID_CLUSTER Notification When Master Is Disqualified
-----------------------------------------------------------------------------------------------------------------------------

When the master node is disqualified by the cmm_membership_qualif function, the nhcmmd daemon on an associated diskless node might emit a CMM_INVALID_CLUSTER notification.

Ignore the notification. The cluster is up and running.

Bug 4928087: switchover + full synch operation generates a duplicate floating address
--------------------------------------------------------------------------------------------------------

When you do a switchover (/opt/SUNWcgha/sbin/nhcmmstat -c so) in parallel with a full synchronization (/opt/SUNWcgha/sbin/nhcrfsadm -f) when the two master-eligible nodes are synchronized, the following series of events occurs:

As a result of this sequence of events, the Reliable NFS cannot set the master IP address to DOWN because this action cannot take place while a full synchronization is in progress.

If you encounter this problem, wait until the full synchronization is complete. This might take some time.

Reliable NFS Bugs

Bug 4624575: Clients Hang When the Vice-Master Node is Stopped
-----------------------------------------------------------------------------------
Halting the vice-master node when clients are writing data to the master node might cause clients to hang for up to 16 seconds before continuing processing.

This problem does not occur when the vice-master node is shut down using the procedures described in the Netra High Availability Suite Foundation Services 2.1 6/03 Cluster Administration Guide.

Bug 4960188: NFS client server deadlock
--------------------------------------------------

The Solaris operating system does not support the case where a node has an NFS client accessing data exported by an NFS server on the same host. In this case, if the NFS client writes large files to the NFS server, the Solaris deadlocks and the node can hang.

This situation can occur if an application on the vice-master fails over or is switched over to the master. The master node hangs and might not be able to function as a master. If you encounter this situation, reboot the hung node.

Bug 4964345: SNDR sets using sector 0 fail, which is not detected by nhinstall/nhadm
-------------------------------------------------------------------------------------------------------

Do not use sector zero in any slice that will be replicated. If you do, the cluster hangs on the final step of SNDR synchronization. You might encounter this situation if you install more than one disk on a node using nhinstall.

Bug 5012570: After adding a diskless: error during a switchover (cannot unshare ...)
-----------------------------------------------------------------------------------------------------

If you add a new diskless node to the cluster using nhinstall, the following error message appears on the master node when you perform a switchover:

Mar 12 16:40:35 men252-1-cgtp nfs unshare:

/export/exec/Solaris_8_sparc.all/usr: not shared

This error occurs because the definition of the share command regarding the /usr directory for diskless nodes appears twice in the nhfs.conf file:

An example of this entry in the nhfs.conf file is as follows:

RNFS.Share.1=share -F nfs -o ro /export/exec/Solaris_8_sparc.all/usr

If you encounter this error, remove one of the duplicate definitions in the nhfs.conf file.

Carrier Grade Transport Protocol (CGTP) Bugs

Bug 4740370: CGTP Broadcast IRE Are Not Recreated After plumb or unplumb
-------------------------------------------------------------------------------------------------

Use of the ifconfig command to plumb or unplumb the CGTP interface is not supported. Using the ifconfig command in this way can lead to unexpected cluster outage.

Action on a single interface leads to inoperative CGTP broadcasts. Broadcasts replicated by CGTP might not be delivered if one of the underlying incoming interfaces is down, and, for the same reason, if the interface has been unplumbed. CGTP broadcasts can NOT survive the brutal unplumbing/replumbing of the underlying network interfaces.

The only way for CGTP broadcasts to survive an ifconfig unplumb is to always respect the following sequence of operations:

Reliable Boot Service (RBS) Bugs

Bug 4621703: Boot Server Allocates the Same IP Address to Two Diskless Nodes
---------------------------------------------------------------------------------------------------

When a diskless node is either stopped or disconnected from the network and then restarted or reconnected without rebooting the operating system, another diskless node booting at the same time can be allocated the first diskless node's IP address. This can happen in any of the following situations:

Daemon Monitoring (nhmpd) Bugs

Bug 5063808: Different Recovery response from Reference Manual
----------------------------------------------------------------------------------

The following anomalies exist between the expected and reported behaviour of nhpmd.


Documentation Errata and Addenda

Changed Documentation Set

Many references to the SMCT installation tool have been removed from the Netra High Availability Suite Foundation Services 2.1 6/03 documentation set because this feature is not supported at the current patch level of the Foundation Services.

Product README

The product README delivered with the Netra High Availability Suite Foundation Services 2.1 6/03 contains an error:
The patch T112281/02 is the Solaris 8 2/02 patch for CGTP. This patch is not for use on Solaris 8 PSR 1.

For the documentation set delivered with the current patch level of the product, you can no longer browse the contents of the /opt/SUNWcgha/doc/html directory through the index file:/opt/SUNWcgha/doc/html/index.html.

Figures Do Not Always Appear On the Correct Page In PDF Documents

In PDF documents, figures might appear on the page after that referenced in the Table of Figures.


Copyright © 2004 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Sun, Sun Microsystems, the Sun logo, Java, JMX, Netra, Solaris, Sun StorEdge, docs.sun.com, Solaris JumpStart, Sun Fire, Javadoc, JDK, Sun4U, Jini, OpenBoot, Sun WorkShop, Solstice DiskSuite, and Forte are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Federal Acquisitions: Commercial Software - Government Users Subject to Standard License Terms and Conditions.

Copyright © 2004 Sun Microsystems, Inc. Tous droits réservés. Distribué par des licences qui en restreignent l'utilisation. Le logiciel détenu par des tiers, et qui comprend la technologie relative aux polices de caractères, est protégé par un copyright et licencié par des fournisseurs de Sun. Sun, Sun Microsystems, le logo Sun, Java, JMX, Netra, Solaris, Sun StorEdge, docs.sun.com, Solaris JumpStart, Sun Fire, Javadoc, JDK, Sun4U, Jini, OpenBoot, Sun WorkShop, Solstice DiskSuite, et Forte sont des marques de fabrique ou des marques déposées de Sun Microsystems, Inc. aux Etats-Unis et dans d'autres pays. Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc. aux Etats-Unis et dans d'autres pays.