5    Service Tools

This chapter describes how to use the various service tools to help minimize system down time. Using the service tools described, service technicians can be notified of the state of your system, making it easier and quicker to recover from errors.

This chapter discusses the following topics:

5.1    WEBES Service Applications and SysMan

Web-Based Enterprise Services (WEBES) provides a web-based suite that integrates hardware analysis software, and revision management tools. The WEBES service tools are included on the operating system APCD or supplied by service personel.

The WEBES tools that are available for Tru64 UNIX are:

Information on the WEBES service tools can be found at:

http://www.support.compaq.com/svctools/webes/index.html

After WEBES is installed, you can run these tools directly from the SysMan Station. In addition, RCM is available from the SysMan Menu.

See Chapter 4 for more information on the various applications and interfaces used to manage and monitor your system.

Compaq service personnel can be notified with the System Initiated Call Logging (SICL) to the service provider's customer service center through its Automatic Call Handling System. System Initiated Call Logging (SICL) is functional only if you also install DSNlink. Information on DSNlink can be found at http://www.support.compaq.com/dsnlink.

5.2    Compaq Analyze (CA)

Compaq Analyze can help minimize down time by providing proactive error notification of faulty hardware components. Faulty components can be automatically put off line using the Automatic Deallocation facility or manually put off line by the administrator to avoid system panics.

Compaq Analyze is a rules-based hardware fault management diagnostic application that provides error event analysis and translation. The multievent correlation analysis feature of Compaq Analyze provides analysis of events stored in the binary system event log or other specified binary log files.

By default, Compaq Analyze provides the analysis used to indict a component and notifies the operating system's component indictment facility. See Section 3.1 for more information on component indictment. Indictment support is only available in Compaq Analyze 4.0 or higher.

Compaq Analyze can be set up at installation for automatic notification of system administrators or service personel. By default, System administrators will be notified by e-mail whenever Compaq Analyze detects a faulty component. Compaq service personel also can be notified with the System Initiated Call Logging (SICL) to the service provider's customer service center through its Automatic Call Handling System. This is configureable in Compaq Analyze and described fully in the Compaq Analyze documentation.

Compaq Analyze also provides an option to run in manual mode. Manually generated analysis by Compaq Analyze does not send automatic notification by indictment, e-mail, or SICL.

When Compaq Analyze is installed, the GUI interface can be launched directly from the SysMan Station by clicking on the Host Icon and selecting Compaq Analyze from the Tools menu.

5.3    Compaq Crash Analysis Tool (CCAT)

The Compaq Crash Analysis Tool (CCAT) can help minimize down time by potentially providing a user with ways of recovering from a system crash quickly.

The CCAT tool can be configured to automatically send crash parameters or results files to the Compaq Support Center (CSC). It also can send e-mail notification to system administrators.

This tool collects data that describes system crashes and matches the data against a set of operating system specific rules to determine if the footprint of the collected crash data matches any known crash data footprints for which a solution or corrective action is known. This capability significantly reduces customer down time by shortening the time required to analyze system crashes.

The CCAT graphical user interface (GUI) is an interactive tool used to analyze crash files manually. The CCAT GUI is used only for onsite manual tasks. It does not log calls or send crash parameters or results to the CSC nor does it send e-mail notification to anyone.

When CCAT is installed, the GUI interface can be launched directly from the SysMan Station by clicking on the Host Icon and selecting CCAT from the Tools menu.

For specific installation and user details, see the Compaq Crash Analysis Tool User guide at http://www.support.compaq.com/svctools/webes/webes_docs.html.

5.4    Revision & Configuration Management (RCM)

The Revision and Configuration Management (RCM) tool provides revision and configuration reporting for Compaq AlphaServer systems running Tru64 UNIX. Under normal circumstances the RCM application is used by Compaq Service Engineers and Compaq Support Center specialists to collect revision and configuration data from customer systems.

The types of reports that RCM can create are as follows:

After RCM is installed, it can be launched directly from the SysMan Menu by selecting the Support and Services branch and then selecting Configure the RCM Data Collector.

For specific installation and user details, see the Revision and Configuration Management Data Collector for Compaq Tru64 UNIX user guide at http://www.support.compaq.com/svctools/webes/webes_docs.html.

5.5    The sys_check Tool

The sys_check tool can help reduce system down time by as much as 50 percent by providing fast access to critical system data. It is recommended that you run a full check at least once a week to maintain the currency of system data. However, some options will take a long time to run and can impact system performance. You should therefore choose your options carefully and run them during off-peak hours. As a minimum, perform at least one full run (all data and warnings) as a postconfiguration task in order to identify configuration problems and establish a configuration baseline.

The sys_check tool is a System Administration application that creates an HTML file that describes the system configuration, and it can be used to diagnose serviceability problems.

The sys_check tool is a system census and configuration verification tool that also is used to aid in diagnosing system errors and problems. Use the sys_check tool to create an HTML report of your system's configuration (software and hardware).

You can run the sys_check tool from the SysMan GUI applications or from the command line interface. For further information on using the sys_check tool, see the System Administration manual, System Configuration and Tuning manual and sys_check(8).

The sys_check tool also performs an analysis of operating system parameters and attributes such as those that tune the performance of the system. The report generated by the sys_check tool provides warnings if it detects problems with any current settings. While the sys_check tool can generate hundreds of useful warnings, it is not a complete and definitive check of the health of your system. The sys_check tool should be used in conjunction with event management and system monitoring tools to provide a complete overview and control of system status. See EVM(5) for more information on event management. See the System Administration manual for information on monitoring your system.

Running the sys_check tool for warning information on possible configuration problems or for performance data takes less time than other options and we suggest you do so once per week.

After you perform OLAR operations, you can use the sys_check tool to check your system configuration. You can use the analysis information to determine if there are potential problems with the operations you just performed. The sys_check tool creates an HTML file that describes the system configuration, and aids you in diagnosing system errors and problems. The application checks system components such as CPUs and provides performance data for those system components. The sys_check tool outputs any warnings and tuning guidelines, which you can use to improve system performance.

5.6    The collect Tool

The collect tool is a system monitoring application that records or displays specific operating system and process data for a set of subsystems. You can configure the collect tool to automatically start when the system is rebooted. The collect tool can assist you in diagnosing performance problems and its report output is requested by your technical support service when they are assisting you in solving system problems. See collect(8) and the System Administration manual for more information.

5.7    Service Applications and Monitoring Applications Quick Start

If you are familiar with the service applications that support the operating system, you can begin using them right away. Table 5-1 summarizes the applications.

Table 5-1:  Service Applications Quick Applications Start

Service Applications Interface Used Invoking the Application Command Line
Compaq Analyze (CA) SysMan Station host icon --> Tools --> Compaq Analyze /usr/sbin/ca
Compaq Crash Analysis Tool (CCAT) SysMan Station host icon --> Tools --> CCAT /usr/sbin/ccat gui
Revision & Configuration Management (RCM) SysMan Station and SysMan Menu host icon --> Tools --> RCM unisetup
sys_check tool SysMan Station and SysMan Menu sysman config_report or sysman escalation sys_check -perf or sys_check -escalate
collect tool SysMan Station and SysMan Menu collect /usr/sbin/collect

5.7.1    Recommended Schedule and Use

When you use the service applications for fault diagnosis, the applications can reduce system down time and enhance system serviceability by providing fast access to critical system configuration data. Table 5-2 gives you some recommended guidelines that you can use to maintain the currency of system data. However, note that some applications will take a long time to run and can impact system performance. You therefore should choose your applications carefully and run them during off peak hours. As a minimum, perform at least one full run (all data and warnings) as a postconfiguration task in order to identify configuration problems and establish a configuration baseline.

Table 5-2 provides guidelines for balancing data needs with performance impact.

Table 5-2:  Recommended Schedule and Use

Service Application Purpose and Use of Application Frequency of Use
Compaq Analyze (CA) Fault analysis and Fault avoidance Continually running
Compaq Crash Analysis Tool (CCAT) Fault analysis After a system crash
Revision & Configuration Management (RCM) Generates system configuration information and analysis As needed
sys_check -perf -warn Generates system configuration information and analysis Run weekly
sys_check Generates system configuration information and analysis Run at least once after installation and after major configuration changes
sys_check -all, or -escalate, or -noquick Generates complete system configuration information and analysis Run only when troubleshooting

5.8    Crash Dump and Save Core Commands

The dumpsys and savecore applications can help diagnose problems after a system crash.

The savecore command usually is invoked automatically during system startup. It determines whether a crash dump has been made, and if there is enough file system space to save it. See savecore(8), the System Administration manual or the Kernel Debugging manual for more information.

The dumpsys command copies a snapshot of memory to a dump file, without halting the system. This feature is useful for estimating crash dump size during dump configuration planning. See dumpsys(8), the System Administration manual or the Kernel Debugging manual for more information.