All cluster members must have a direct connection to all other members in order to facilitate communications among members, and to provide a fast and reliable transport for passing messages throughout the cluster. This version of the TruCluster Server product supports the Memory Channel interconnect, a specialized interconnect designed specifically for the needs of clusters. The Memory Channel interconnect provides both broadcast and point-to-point connections between cluster members. Future releases will support other types of cluster interconnect.
The Memory Channel interconnect:
Allows a cluster member to set up a high-performance, memory-mapped connection to other cluster members. These other cluster members can, in turn, map transfers from the Memory Channel interconnect directly into their memory. A cluster member can thus obtain a write-only window into the memory of other cluster systems. Normal memory transfers across this connection can be accomplished at extremely low latency (3 to 5 microseconds).
Has built-in error checking, virtually guaranteeing no undetected errors and allowing software error detection mechanisms, such as checksums, to be eliminated. The detected error rate is very low (on the order of one error per year per connection).
Supports high-performance mutual exclusion locking (by means of spinlocks) for synchronized resource control among cooperating applications.
Figure 7-1
shows the general flow of a Memory Channel transfer.
Figure 7-1: Memory Channel Logical Diagram
A Memory Channel adapter must be installed in a PCI slot on each member system. A link cable connects the adapters. If the cluster contains more than two members, a Memory Channel hub is also required.
A redundant, multirail Memory Channel configuration can further improve reliability and availability. It requires a second Memory Channel adapter in each cluster member, and link cables to connect the adapters. A second Memory Channel hub is required for clusters containing more than two members.
The Memory Channel multirail model operates on the concept of physical rails and logical rails. A physical rail is defined as a Memory Channel hub with its cables and Memory Channel adapters and the Memory Channel driver for the adapters on each node. A logical rail is made up of one or two physical rails.
A cluster can have one or more logical rails, up to a maximum of four. Logical rails can be configured in the following styles:
If a cluster is configured in the single-rail style, there is a one-to-one relationship between physical rails and logical rails. This configuration has no failover properties; if the physical rail fails, the logical rail fails. Its primary use is for high-performance computing applications using the Memory Channel application programming interface (API) library and not for highly available applications.
If a cluster is configured in the failover pair style, a logical rail consists of two physical rails, with one physical rail active and the other inactive. If the active physical rail fails, a failover takes place and the inactive physical rail is used, allowing the logical rail to remain active after the failover. This failover is transparent to the user. The failover pair style is the default for all multirail configurations.
A cluster fails over from one Memory Channel interconnect to another if a configured and available secondary Memory Channel interconnect exists on all member systems, and one of the following situations occurs in the primary interconnect:
More than ten errors are logged within one minute.
A link cable is disconnected.
The hub is turned off.
After the failover completes, the secondary Memory Channel interconnect becomes the primary interconnect. Another interconnect failover cannot occur until you fix the problem with the interconnect that was originally the primary.
If more than ten Memory Channel errors occur on any member system within a one-minute interval, the Memory Channel error recovery code attempts to determine if a secondary Memory Channel interconnect has been configured on the member as follows:
If a secondary Memory Channel interconnect exists on all member systems, the member system that encountered the error marks the primary Memory Channel interconnect as bad and instructs all member systems (including itself) to fail over to their secondary Memory Channel interconnect.
If any member system does not have a secondary Memory Channel interconnect configured and available, the member system that encountered the error displays a message indicating that it has exceeded the Memory Channel hardware error limit and panics.
See the TruCluster Server Hardware Configuration manual for information on how to configure the Memory Channel interconnect in a cluster.
The Memory Channel API library implements highly efficient memory sharing between Memory Channel API cluster members, with automatic error handling, locking, and UNIX style protections. See the TruCluster Server Highly Available Applications manual for a discussion of the Memory Channel API library.