1    Introduction to TruCluster Server

TruCluster Server Version 5.0A is a highly integrated synthesis of Tru64 UNIX software, AlphaServer systems, and storage devices that operate as a single system. A TruCluster Server cluster acts as a single virtual system, even though it is made up of multiple systems. Members of the cluster can share resources, data storage, and clusterwide file systems under a single security and management domain, yet they can boot or shut down independently without disrupting the cluster's services to clients.

A TruCluster Server environment can be as simple or as feature-rich as you require. You configure a cluster that fits your needs, from a two-node cluster up to an eight-node cluster running high availability applications such as transaction processing systems, servers for network client/server applications, data-sharing applications that require maximum uptime, and distributed parallel processing applications that take full advantage of the TruCluster Server application programming interfaces (APIs).

TruCluster Server includes a cluster alias for the Internet protocol suite (TCP/IP) so that a cluster appears as a single system to its network clients and peers.

If you know how to manage a Tru64 UNIX system, you already know how to manage a TruCluster Server cluster because TruCluster Server extends single-system management capabilities to clusters. It provides a clusterwide namespace for files and directories, including a single root (/) file system that all cluster members share. In like manner, it provides a clusterwide namespace for storage devices; each storage device has the same unique device name throughout the cluster.

The SysMan suite of graphical management utilities provides an integrated view of the cluster environment, letting you manage a single member or the entire cluster. Figure 1-1 shows the SysMan Station hardware view for a cluster named deli with three members: provolone, polishham, and pepicelli.

Figure 1-1:  A Cluster's View of Hardware

TruCluster Server preserves the following availability and performance features found in the TruCluster products provided for the Tru64 UNIX Version 4.0 series operating system:

TruCluster Server Version 5.0A provides the features listed in Table 1-1.

Table 1-1:  Features in the TruCluster Server Version 5.0A Product

Feature Description
Clusterwide namespace

The Cluster File System (CFS) supports a single clusterwide namespace and uniform coherent access to all file systems in a cluster. Context-dependent symbolic links (CDSLs) are used to maintain per-system configuration and data files within the shared CFS root (/), /usr, and /var file systems.

See Section 2.2 for more information on CFS. See Section 2.4 for more information on CDSLs.

Clusterwide access to disk and tape storage

The device request dispatcher facility provides highly available clusterwide access to both character and block disk devices, as well as tape devices. All cluster disk and tape I/O passes through the device request dispatcher.

See Section 2.3 for more information on the device request dispatcher.

Clusterwide Logical Storage Manager (LSM)

The semantics of LSM have been extended to a cluster environment.

See Section 2.7 for more information on LSM in a cluster environment.

Connection manager

The connection manager ensures that all cluster members communicate with each other in order to control the formation and continued operation of a cluster. The connection manager calculates the votes required for quorum and decides when members are added to and removed from the cluster.

See Chapter 3 for more information on the connection manager.

Cluster application availability (CAA)

The CAA facility provides resource monitoring and application restart capabilities. It provides the same type of application availability provided by user-defined services in the TruCluster Available Server Software and TruCluster Production Server Software products.

See Chapter 4 for a definition of the types of applications that can run in a cluster. See Chapter 5 for more information on CAA's role in making single-instance applications highly available.

In Version 5.0A, CAA introduces support for tape devices and media changers as monitored resources. CAA also provides additional features for existing resource types, such as application monitoring through the check entry point of a service's action script. CAA also supports additional profile attributes, including restart attempts, failover delay, failover threshold and interval, autostart, and active placement.

Cluster alias

The cluster alias subsystem lets TCP and UDP applications address the cluster as though it were a single system. When the cluster is created, a default alias is defined that addresses all cluster members. A site can define additional aliases that address some or all cluster members.

See Chapter 6 for more information on cluster aliases.

For Version 5.0A, a virtual Media Access Control (vMAC) address can be assigned to an alias. When vMAC support is enabled, an alias vMAC address follows the alias's proxy ARP master from node to node as needed. Regardless of which cluster member is serving as the proxy ARP master for an alias, the alias's vMAC address does not change. For more information on vMAC, see Section 6.7.

Cluster alias has also been enhanced in the way in which it handles ICMP packets. Other changes include enhancements in loopback performance and a reduction in the use of distributed lock manager (DLM) locks for single-instance service selection.

Highly available NFS server using cluster alias

As shipped, the cluster is a highly available NFS server. CFS ensures that file systems exported from a TruCluster Server cluster are highly available to clients. Clients use the default cluster alias as the name of the NFS server when mounting file systems exported by the cluster.

See Section 6.3 for more information.

Memory Channel interconnect

The Memory Channel interconnect is a high-speed interconnect designed specifically for the needs of clusters. The Memory Channel interconnect provides both broadcast and point-to-point connections between cluster members.

TruCluster Server provides a Memory Channel application programming interface (API) library, which is the same as that provided in the TruCluster Production Server Software product.

See Chapter 7 for more information on the Memory Channel interconnect. See the TruCluster Server Highly Available Applications manual for a description of the Memory Channel API.

Internode Communication Subsystem (ICS) optimized for Memory Channel

In Version 5.0A, the Internode Communication Subsystem (ICS) Memory Channel Transport (MCT) provides higher performance, better scalability, and quicker failure detection than the previous TruCluster Server implementation of ICS that used TCP/IP over Memory Channel. ICS MCT takes full advantage of the features offered by Memory Channel for optimized data transfer and copy avoidance. It also takes advantage of Memory Channel features for failure detection rather than relying on TCP/IP timeouts.

In addition, the ICSNET network driver provides a network interface for the Memory Channel. Because the driver uses ICS to communicate with other cluster members, it is interconnect-independent.

Distributed lock manager (DLM)

TruCluster Server supports the DLM and its API, which is the same as that provided in the TruCluster Production Server Software product.

See Chapter 8 for a description of the DLM. See the TruCluster Server Highly Available Applications manual for a description of the DLM API.

Single-system management

Because a cluster uses CFS, all systems' configuration files are available for management. The SysMan suite of graphical management utilities provides an integrated view of the cluster environment, letting you manage a single member or the entire cluster.

See Chapter 9 for an overview of cluster installation and administration.

Rolling Upgrade/Patch TruCluster Server Version 5.0A contains the software infrastructure required to support rolling upgrades and patches. Customers who install TruCluster Server Version 5.0A will be able to perform a rolling upgrade to subsequent TruCluster Server releases, and roll patches onto a Version 5.0A cluster.
Single Security Domain

Because a cluster uses CFS, there is a single copy of security administration files such as /etc/passwd and /etc/group. A user authenticated on one member has access to all members. A user with access to a file on one member has access to that file from any member. Access control lists (ACLs) are uniformly available to all members.

Expanded process IDs (PIDs) PIDs are expanded to a full 32-bit value. PIDs are unique across a cluster. Each cluster member has a block of numbers that it assigns as PIDs.