September 2004
Copyright © 2004 Sun
Microsystems, Inc. All rights reserved.
These Release Notes contain important product notes and known restrictions in the Netra HA Suite Foundation Services 2.1 6/03. Details of workarounds to known bugs are provided where possible. In cases where there are differences between these Release Notes and the Netra HA Suite Foundation Services 2.1 6/03 documentation set, the information in these Release Notes takes precedence. In the rest of this document, the product is referred to as the Foundation Services.
For information about the supported
hardware, see the Netra High Availability
Suite Foundation Services 2.1 6/03 Hardware Guide. For
information about the packages and patches delivered in the software
delivery, see the Netra High Availability Suite Foundation
Services 2.1 6/03 README.
If you are planning to upgrade your
cluster from Foundation Services 2.1 to Foundation Services 2.1 6/03,
install the documentation packages as described in the Netra High
Availability Suite Foundation Services 2.1 6/03 README. After you
have installed the documentation, see "Upgrading the Cluster"
in the Netra High Availability Suite Foundation Services 2.1 6/03
Custom Installation Guide.
This section describes the template files required by the nhinstall tool.
The nhinstall tool is delivered with a set of templates for the addon.conf configuration file. The addon.conf file defines additional packages and patches that are not delivered with the Foundation Services software delivery. For information about how to use the templates to create the addon.conf file for your cluster, see the addon.conf(4) man page.
The following platform-specific addon.conf template files contain additional packages and patches for each hardware platform:
addon.conf.NETRA-20.template
addon.conf.NETRA-T1.template
addon.conf.NETRA-CT.template This
file is for Netra CT 410 and Netra CT 810
This section lists the software you can use with the Foundation Services and specifies the versions of this software for different types of hardware.
The software supported with the Foundation Services 6/03 is as follows:
Java Dynamic Management Kit 5.0
SPARC Solaris 8 2/02, Solaris 8 HW 7/03, Solaris 9, or Solaris 9 9/02
Java 2 Software Development Kit Standard Edition
Version 1.3.1 for Solaris 8 2/02 and Solaris HW 7/03.
Version 1.4 for Solaris 9 and Solaris 9 9/02.
Sun StorEdge Network Data Replicator (SNDR) 3.1 with patches
Volume Management
Solstice DiskSuite Version 4.2.1 for Solaris 8 2/02. For installation information, see the Solstice DiskSuite 4.2.1 Installation and Product Notes
Solaris Volume Manager for Solaris 9 and Solaris 9 9/02. For installation information, see the Solaris Volume Manager Administration Guide
Since June 2003 (6/03), three patches
have been
delivered containing enhancements of and bug fixes for the Foundation
Services. Install these patches.
The names and locations of these patches are as follows:
Patch name |
Download
Location |
114175 |
http://sunsolve.sun.com/point |
115606 | http://sunsolve.france.sun.com/cgi/show.pl?target=home |
115644 | http://sunsolve.france.sun.com/cgi/show.pl?target=home |
Patch 114175 has a dependency on the SNDR patch 116710. For installation information specific to these Foundation Services patches, see the README provided with each patch.
The Netra HA Suite download contains three SNDR patches:
113054-04/, 113055-01/, 113057-03/.
Do not install these SNDR patches if you have installed version -02 or
later of the Foundation Services patches 114175, 115606,
and 115644. Instead, install patch 116710 which
replaces these
three SNDR patches. This SNDR patch is available on Sun Solve at http://sunsolve.sun.com/point.
The Netra HA Suite download contains a Solaris patch for CGTP, T112281-02.
The Solaris patch for CGTP that you install depends on the version of Solaris you are installing on the cluster. Use the following table to choose the correct Solaris patch for CGTP to install on your cluster:
Solaris Version |
Solaris Patch for CGTP |
Location of Patch |
Solaris 8 2/02 |
T112281-02 |
Part of Foundation Services distribution |
Solaris 8 2/02 and kernel patch 108525-21 |
T116036-02 |
|
Solaris 8 PSR3 |
T116036-02 |
|
Solaris
9 |
112902-06 112904-01 112917-01 112918-01 112919-01 These patches are obsoleted by 112233-01 or later |
|
release
9/02 or later of the Solaris operating system |
no CGTP
patch required |
N/A |
Depending on the hardware you are using, you might require a specific level of software platform. For Netra CT 810 and Netra CT 410, you require at least RRL6.1. For Netra CT 820 boards, you require at least DVD0-11 and for Netra CP2300 with the Rapid Development Kit chassis, you require at least DVD0-10.
The version of Solaris you install on a cluster depends on the hardware you are using.
Solaris 8 2/02
Netra 120
Netra CT 820
Netra T1
Netra 20
Solaris 8 HW 7/03
Netra 240
Sun FireTM V210
Sun Fire V240
Solaris 9 (build 58 hwpl 3)
Netra 120
Netra T1
Netra CT 410
Netra CT 810
Netra 20
Solaris 9 9/02
Netra 120
Netra T1
Netra CT 410
Netra CT 810
Netra 20
The Foundation Services are supported on a cluster of up to 18 nodes.
The SMCT tool is not supported at the current patch level of the Foundation Services.
The CMM API is now a 64-bit library. The 64-bit library provides the same API as the 32-bit library used in the Foundation Services 2.1 release.
The modification has no impact on existing 32-bit applications. However, 32-bit applications compiled with the 64-bit library might trigger a compilation warning. It might be necessary to replace ulong types with uint32_t types.
When rebooting a master-eligible node on a running cluster, do not use the reboot command. Instead, use the init command as root user, as follows:
# init 6
Using the reboot command kills processes in an indeterminate order and therefore does not respect the required sequence for stopping services. This can lead to inconsistencies in data replication.
When a master-eligible node is reintegrated into the cluster (for example, after maintenance or failure), there is a period when disk partitions are resynchronizing. While a cluster is unsynchronized, the data on the master node disk is not fully backed up. It is unwise to schedule major tasks when the cluster is unsynchronized.
This section lists the known bugs and their workarounds where available. The bugs are described in the following subsections:
Cluster Membership Manager (CMM) Bugs
Carrier Grade Transport Protocol (CGTP) Bugs
Bug 4697437: Notifications of Diskless Node State Transitions Can
Be Lost
--------------------------------------------------------------------------------------------
Notifications that describe the difference between an initial state and a final state are emitted by the CMM on the master node when the cluster membership changes. The CMM running on a diskless node can miss notifications for transitory states. For example, when a cluster passes through three states (CC1, CC2, and CC3), a notification should be emitted to describe the transition from CC1 to CC2, and then to describe transition from CC2 to CC3. In this release of the product, a diskless node might only receive the notification for the overall transition from CC1 to CC3. The diskless node might miss the notification for the transient state CC2.
When a cluster passes from state CC1 to CC2, and then back to state CC1, the diskless node might not receive any notification.
Bug 4746183: Single Point of Failure Occurs Immediately After
Switchover
-------------------------------------------------------------------------------------------
A single point of failure exists for a brief period of time after a switchover. The single point of failure lasts until the Reliable NFS receives the following notifications from the CMM, MASTER_ELECTED and VICE_MASTER_ELECTED.
You can use the nhcmmstat tool to check which notifications have been received.
If the newly elected master node reboots before the notifications are received, see the Netra High Availability Suite Foundation Services 2.1 6/03 Cluster Administration Guide for information about how to recover a cluster.
Bug 4740446: Switchover Is Initiated Even Though the
CMM_FLAG_SYNCHRO_NEEDED Flag Is Set
---------------------------------------------------------------------------------------------------------------------------
There is a small time frame between the moment a command to change the synchronization state is issued at the API level and the moment when the nhcmmd daemon handles the command. If a switchover request is issued inside this time frame, the switchover request is accepted even if the cluster is no longer synchronized.
In this scenario, a call from Reliable NFS to clear the CMM_FLAG_SYNCHRO_NEEDED flag will fail because a switchover is in progress. Consequently, the master node will reboot and the replication will stop until the vice-master node is rebooted.
Verify that the CMM_FLAG_SYNCHRO_NEEDED flag is clear before requesting a switchover.
To recover from this problem, reboot the vice-master node.
Bug 4751051: Heartbeat of the Master Node Can Be Lost During
Synchronization
---------------------------------------------------------------------------------------------------
During a full synchronization between the master node and the vice-master node, the following message might be displayed on the vice-master node console: master loss detected, but cannot switchover.
This message is generated because the network load prevents the vice-master node from detecting all of the master node heartbeats. The vice-master node can, therefore, conclude that the master node has failed.
Because the synchronization is in progress, the vice-master node cannot take the master role and there is no impact on the master node.
During periods when the vice-master node cannot detect the
heartbeat of the master node; the synchronization is paused.
Bug 4749139: Library Clients Should Rely on Local Notifications
Only
---------------------------------------------------------------------------------------
When a master-eligible node is elected as the vice-master node, the master node notifies the other peer nodes just before the data in the master node API module is updated.
As a consequence, the cmm_vicemaster_getinfo() function called on the master node can fail and return a CMM_ESRCH error, even though the CMM library clients on the other peer nodes have already received the CMM_VICEMASTER_ELECTED notification.
See the Netra High Availability Suite Foundation Services 2.1 6/03 CMM Programming Guide for more information.
Bug 4796226: cmm_mastership_release on Master Node Returns
Incorrect Value if Vice-Master Out
--------------------------------------------------------------------------------------------------------------------------
When the cmm_mastership_release function is run on the master node, the function checks for the presence of the vice-master node. If the vice-master node is OUT_OF_CLUSTER, the function should return the CMM_ECANCELED value.
Instead, the function returns CMM_ETIMEDOUT value.
Bug 4854761: cmm_membership_remove on Master Node Causes Errors if
Vice-Master Node Fails
-----------------------------------------------------------------------------------------------------------------------
If the vice-master node fails while the cmm_membership_remove function is running on the master node, the CMM_OK value is returned but the master node does not behave correctly.
The master node does the following:
Continues to be the master node
Does not detect that the vice-master node has failed
Does not detect any other changes in cluster membership
Bug 4845598 Diskless Node Emits CMM_INVALID_CLUSTER Notification
When
Master Is Disqualified
-----------------------------------------------------------------------------------------------------------------------------
When the master node is disqualified by the cmm_membership_qualif function, the nhcmmd daemon on an associated diskless node might emit a CMM_INVALID_CLUSTER notification.
Ignore the notification. The cluster is up and running.
Bug 4928087: switchover + full synch operation
generates a duplicate floating address
--------------------------------------------------------------------------------------------------------
When you do a switchover (/opt/SUNWcgha/sbin/nhcmmstat -c so) in parallel with a full synchronization (/opt/SUNWcgha/sbin/nhcrfsadm -f) when the two master-eligible nodes are synchronized, the following series of events occurs:
As a result of this sequence of events, the Reliable NFS cannot set the master IP address to DOWN because this action cannot take place while a full synchronization is in progress.
If you encounter this problem, wait until the full synchronization is complete. This might take some time.
Bug 4624575: Clients Hang When the Vice-Master Node is Stopped
-----------------------------------------------------------------------------------
Halting the vice-master node when clients are writing data to the
master node might cause clients to hang for up to 16 seconds before
continuing processing.
This problem does not occur when the vice-master node is shut down using the procedures described in the Netra High Availability Suite Foundation Services 2.1 6/03 Cluster Administration Guide.
Bug 4960188: NFS client server deadlock
--------------------------------------------------
The Solaris operating system does not support the case where a node has an NFS client accessing data exported by an NFS server on the same host. In this case, if the NFS client writes large files to the NFS server, the Solaris deadlocks and the node can hang.
This situation can occur if an application on the vice-master fails over or is switched over to the master. The master node hangs and might not be able to function as a master. If you encounter this situation, reboot the hung node.
Bug 4964345: SNDR sets using sector 0 fail, which is not detected by
nhinstall/nhadm
-------------------------------------------------------------------------------------------------------
Do not use sector zero in any slice that will be replicated. If you do, the cluster hangs on the final step of SNDR synchronization. You might encounter this situation if you install more than one disk on a node using nhinstall.
Bug 5012570: After adding a diskless: error during a
switchover (cannot unshare ...)
-----------------------------------------------------------------------------------------------------
If you add a new diskless node to the cluster using nhinstall, the following error message appears on the master node when you perform a switchover:
Mar 12 16:40:35 men252-1-cgtp nfs unshare:
/export/exec/Solaris_8_sparc.all/usr: not shared
This error occurs because the definition of the share command regarding the /usr directory for diskless nodes appears twice in the nhfs.conf file:
An example of this entry in the nhfs.conf file is as follows:
RNFS.Share.1=share -F nfs -o ro /export/exec/Solaris_8_sparc.all/usr
If you encounter this error, remove one of the duplicate definitions in the nhfs.conf file.
Bug 4740370: CGTP Broadcast IRE Are Not Recreated After plumb or
unplumb
-------------------------------------------------------------------------------------------------
Use of the ifconfig command to plumb or unplumb the CGTP interface is not supported. Using the ifconfig command in this way can lead to unexpected cluster outage.
Action on a single interface leads to inoperative CGTP broadcasts.
Broadcasts replicated by CGTP might not be delivered if one of the
underlying incoming interfaces is down, and, for the same reason, if
the interface has been unplumbed. CGTP broadcasts can NOT survive the
brutal unplumbing/replumbing of the underlying network interfaces.
The only way for CGTP broadcasts to survive an ifconfig unplumb is to always respect the following sequence of operations:
Bug 4621703: Boot Server Allocates the Same IP Address to Two
Diskless Nodes
---------------------------------------------------------------------------------------------------
When a diskless node is either stopped or disconnected from the network and then restarted or reconnected without rebooting the operating system, another diskless node booting at the same time can be allocated the first diskless node's IP address. This can happen in any of the following situations:
Bug 5063808: Different Recovery response from Reference Manual
----------------------------------------------------------------------------------
The following anomalies exist between the expected and reported behaviour of nhpmd.
Many references to the SMCT installation tool have been removed from
the Netra High Availability Suite Foundation Services 2.1 6/03
documentation set because this feature is not supported at the
current patch level of the Foundation Services.
The product README delivered with the Netra High Availability Suite
Foundation Services 2.1 6/03 contains an error:
The patch T112281/02 is the Solaris 8 2/02 patch for CGTP. This patch
is not for use on Solaris 8 PSR 1.
For the documentation set delivered with the current patch level of
the product, you can no longer browse the contents of the
/opt/SUNWcgha/doc/html directory through the index
file:/opt/SUNWcgha/doc/html/index.html.
In PDF documents, figures might appear on the page after that referenced in the Table of Figures.
Copyright © 2004 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Sun, Sun Microsystems, the Sun logo, Java, JMX, Netra, Solaris, Sun StorEdge, docs.sun.com, Solaris JumpStart, Sun Fire, Javadoc, JDK, Sun4U, Jini, OpenBoot, Sun WorkShop, Solstice DiskSuite, and Forte are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Federal Acquisitions: Commercial Software - Government Users Subject to Standard License Terms and Conditions.
Copyright © 2004 Sun Microsystems, Inc. Tous droits réservés. Distribué par des licences qui en restreignent l'utilisation. Le logiciel détenu par des tiers, et qui comprend la technologie relative aux polices de caractères, est protégé par un copyright et licencié par des fournisseurs de Sun. Sun, Sun Microsystems, le logo Sun, Java, JMX, Netra, Solaris, Sun StorEdge, docs.sun.com, Solaris JumpStart, Sun Fire, Javadoc, JDK, Sun4U, Jini, OpenBoot, Sun WorkShop, Solstice DiskSuite, et Forte sont des marques de fabrique ou des marques déposées de Sun Microsystems, Inc. aux Etats-Unis et dans d'autres pays. Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc. aux Etats-Unis et dans d'autres pays.