1    Introduction to Online Addition and Removal

This chapter introduces the following topics:

1.1    Overview

This chapter introduces Online Addition and Removal (OLAR) of components and the features that interact with it in order to increase a system's availability. These features help you maintain system availability and availability of services by addressing the following factors:

The topics in this chapter can help minimize impact to a system's availability by addressing fault detection (Compaq Analyze), fault anticipation and avoidance (component indictment, automatic deallocation, and memory troller), and recovery (Online Addition and Removal).

1.2    Component Indictment and Automatic Deallocation

Component indictment is a proactive error notification from a fault analysis utility, indicating that a component is experiencing high incidence of correctable errors, and therefore should be serviced. Component indictment involves the process of analyzing specific failure patterns from error log entries, either immediately or over a given time interval, and recommending a component's removal. The fault analysis utility signals the operating system that a given component is suspect. This causes the operating system to distribute the fault information through an indictment event. Interested applications, including SysMan Station, and the Automatic Deallocation Facility can update their state information, and take appropriate action if so configured.

The Automatic Deallocation facility of the operating system can be configured by the system administrator to automatically put off line an indicted component.

For more information on Component Indictment and Automatic Deallocation, see Chapter 3.

1.3    Online Addition and Removal

Online Addition and Removal is the ability to add or remove critical system components while the operating system services and applications continue to run.

Online Addition and Removal management is used to expand capacity, upgrade components, and replace failed components without adversely affecting the availability of the system. This functionality, sometimes referred to as hot-swap, provides the benefits of increased system up time and availability during both scheduled and unscheduled maintenance. Starting with Tru64 UNIX V5.1A, CPU OLAR is supported.

For more information on Online Addition and Removal, see Chapter 4.

1.4    Service Tools

Applications in the WEBES suite of tools provide a core of common service tool functionality, including hardware diagnosis, operating system analysis, system configuration and revision reporting capabilities. These tools have been integrated into the SysMan suite of tools in order to allow easy, centralized access. For more information, see Chapter 5. Additionally, the Compaq Analyze component of the WEBES suite is used in the Component Indictment process.

1.5    Memory Troller

The memory troller is an operating system mechanism that proactively locates and scrubs correctable memory errors. The memory troller systematically reads each memory location at a configurable rate. If it discovers a correctable memory error, it triggers the just-in-time scrubbing mechanism. For more information on memory trolling, see Chapter 6.