This chapter introduces the following topics:
Component Indictment and Automatic Deallocation (Section 1.2)
Online Addition and Removal (Section 1.3)
Service Tools (Section 1.4)
Memory Troller (Section 1.5)
This chapter introduces Online Addition and Removal (OLAR) of components and the features that interact with it in order to increase a system's availability. These features help you maintain system availability and availability of services by addressing the following factors:
Minimizing scheduled and unscheduled down time for capacity expansion and component upgrades
Recovering from failures
Proactively identifying potential failures
The topics in this chapter can help minimize impact to a system's availability
by addressing fault detection (Compaq Analyze), fault anticipation and
avoidance (component indictment, automatic deallocation, and memory troller),
and recovery (Online Addition and Removal).
1.2 Component Indictment and Automatic Deallocation
Component indictment is a proactive error notification from a fault analysis utility, indicating that a component is experiencing high incidence of correctable errors, and therefore should be serviced. Component indictment involves the process of analyzing specific failure patterns from error log entries, either immediately or over a given time interval, and recommending a component's removal. The fault analysis utility signals the operating system that a given component is suspect. This causes the operating system to distribute the fault information through an indictment event. Interested applications, including SysMan Station, and the Automatic Deallocation Facility can update their state information, and take appropriate action if so configured.
The Automatic Deallocation facility of the operating system can be configured by the system administrator to automatically put off line an indicted component.
For more information on Component Indictment and Automatic Deallocation,
see
Chapter 3.
1.3 Online Addition and Removal
Online Addition and Removal is the ability to add or remove critical system components while the operating system services and applications continue to run.
Online Addition and Removal management is used to expand capacity, upgrade components, and replace failed components without adversely affecting the availability of the system. This functionality, sometimes referred to as hot-swap, provides the benefits of increased system up time and availability during both scheduled and unscheduled maintenance. Starting with Tru64 UNIX V5.1A, CPU OLAR is supported.
For more information on Online Addition and Removal, see
Chapter 4.
1.4 Service Tools
Applications in the WEBES suite of tools provide a core of common service
tool functionality, including hardware diagnosis, operating system analysis,
system configuration and revision reporting capabilities.
These tools have
been integrated into the SysMan suite of tools in order to allow easy, centralized
access.
For more information, see
Chapter 5.
Additionally,
the Compaq Analyze component of the WEBES suite is used in the Component Indictment
process.
1.5 Memory Troller
The memory troller is an operating system mechanism that proactively locates and scrubs correctable memory errors. The memory troller systematically reads each memory location at a configurable rate. If it discovers a correctable memory error, it triggers the just-in-time scrubbing mechanism. For more information on memory trolling, see Chapter 6.