B Configuring Switches for a Highly Available LAN Interconnect

The recommended highly available LAN interconnect configuration includes two network adapters per member configured as a two-member redundant array of independent network adapters (NetRAIN) virtual interface and connected to two independent switches. Proper operation of NetRAIN in this configuration requires an interswitch link to carry its maintenance and failover traffic. In this no-single-point-of-failure (NSPOF) LAN interconnect configuration, no single failure of the interconnect hardware will disable the whole cluster. However, the failure of this interswitch link can, under certain circumstances, result in a network partition that can cause the removal of up to half of the members from the cluster. (See Section 6.3.3.)

We recommend that you configure an additional interswitch link between the switches to avoid this behavior. However, the introduction of the additional link requires that the switches be additionally configured to avoid packet-forwarding problems caused by the routing loop created by the second link.

Typical switches provide at least one of the following three mechanisms to support parallel interswitch links. In order of decreasing desirability for cluster configurations, the mechanisms are:

Link aggregation: Treats multiple physical links as a single link and distributes packet traffic among them. (Section B.1)
Link resiliency: Treats multiple physical links as an active link and one or more standby links and fails over between them. (Section B.2)
Spanning Tree Protocol: Employs a distributed routing protocol to permit switches to cooperate to remove routing loops. This is an IEEE standard mechanism (IEEE 802-1d). (Section B.3)

The following sections discuss each of these in detail and describe the switch requirements and configuration options appropriate to each mechanism.

B.1 Link Aggregation

If it is supported, link aggregation (also known as port trunking) is the best available solution to implement parallel interswitch links for a highly available LAN interconnect. Using link aggregation, you group the ports on each switch that are cross-cabled to the ports on the other switch. Each set of ports makes up a single virtual link. Traffic between the two switches is sent across the physical links that make up the virtual link.

This configuration provides several benefits:

If any link or port in the virtual link fails, that physical link is disabled, but the other physical links that make up the virtual link continue to operate. The result is that there is no loss of connectivity between the two switches.

Failover is normally immediate.

Because each physical link can carry traffic between the two switches, the total available bandwidth between the switches may be greater than a single interswitch link can provide.

Note

Many switches, by default, use an algorithm based on the destination IP address or media access control (MAC) address of a specific packet of data to decide which physical port will carry it. That is, traffic between two systems over an interswitch link always uses the same physical link. Depending on which adapters are active, this might not result in increased bandwidth. Some switches allow the choice of a round-robin algorithm that distributes traffic evenly, regardless of destination. If the switches used for the LAN interconnect support such an algorithm, using it may result in more efficient use of the interswitch links. The lack of support for such an algorithm does not impact the fault tolerance of the aggregated link; it only reduces the potential performance benefit.

B.2 Link Resiliency

Some switches support link resiliency. If link aggregation is not supported, link resiliency is the next best option. Resilient links are specifically designed to support link failover. Typically, two links are involved: a main link and a standby link. Only the main link carries traffic between the two switches. When a failure is detected with the main link, the switches immediately start using the standby link. If the main link comes back on line, the switches may either start using the main link again, or they may continue using the standby link.

Like link aggregation, link resiliency supports a quick failover in the event of link failure. However, unlink link aggregation, only one link is in use at a time, so there is no increase in available bandwidth.

B.3 Spanning Tree Protocol (STP)

If neither of the previous two options are supported, you can use parallel links between the switches if both switches support the Spanning Tree Protocol standard (IEEE 802.1d). This industry-wide standard is designed to detect and remove packet loops in a network. When STP is enabled between the switches, only one interswitch link is used. If that link fails, the switches reconfigure themselves and use the other interswitch link, similar to resilient links.

When using STP in a LAN interconnect, the switch must adhere to the following requirements:

The switch must allow STP to be disabled on a port-by-port basis. Some manufacturers who allow STP to be enabled or disabled only for the entire switch provide a mechanism (such as fast forwarding) to bypass the protocol on selected ports.

STP route-learning time must be configurable to be shorter than the cluster NetRAIN link failover time (10 seconds).

When configuring a switch capable of STP in a LAN interconnect, comply with the following rules:

Configure STP only on the ports that are used for the interswitch links. When some network cards are involved in a NetRAIN failover, they can trigger spanning tree reconfiguration if STP is enabled on their ports. The switches will drop packets during the spanning tree reconfiguration, which can result in a loss of connectivity for the node involved in the NetRAIN failover, even after the switches have finished the reconfiguration process. Consequently, spanning tree routing must be turned off on the ports of the switch that are connected to cluster members, and enabled only on those ports that are cross-cabled between the switches.
Spanning tree routing has no use on ports connected to end nodes, and can cause problems. However, not all switches support selectively enabling and disabling spanning tree routing per port. In those cases, use link aggregation or link resiliency to implement parallel links (these are preferable to STP anyway), or do not use parallel interswitch links at all.

Adjust the STP settings on the switches to minimize the amount of time they spend during the reconfiguration process, because the switches will drop packets while they are in the reconfiguration process. Most switches allow three basic settings to be changed: hello time, forward delay, and maximum age. Set all three settings to their minimum values, which are normally 1 second for hello time, 4 seconds for forward delay, and 6 seconds for maximum age. These adjustments can help the switch recover more quickly in the event of the failure of an interswitch link.