This chapter describes how to use the various service tools to help minimize system down time. Using the service tools described, service technicians can be notified of the state of your system, making it easier and quicker to recover from errors.
This chapter discusses the following topics:
WEBES Service Tools integration with SysMan (Section 5.1)
Compaq Analyze (CA) (Section 5.2)
Compaq Crash Analysis Tool (CCAT) (Section 5.3)
Revision & Configuration Management (RCM) (Section 5.4)
The
sys_check
tool (Section 5.5)
The
collect
tool (Section 5.6)
Service Tools and Monitoring Applications Quick Start (Section 5.7)
Crash Dump and Save Core Applications (Section 5.8)
5.1 WEBES Service Applications and SysMan
Web-Based Enterprise Services (WEBES) provides a web-based suite that integrates hardware analysis software, and revision management tools. The WEBES service tools are included on the operating system APCD or supplied by service personel.
The WEBES tools that are available for Tru64 UNIX are:
Compaq Analyze (CA)
Compaq Crash Analysis Tool (CCAT)
Revision & Configuration Management (RCM)
Information on the WEBES service tools can be found at:
http://www.support.compaq.com/svctools/webes/index.html
After WEBES is installed, you can run these tools directly from the SysMan Station. In addition, RCM is available from the SysMan Menu.
See Chapter 4 for more information on the various applications and interfaces used to manage and monitor your system.
Compaq service personnel can be notified with the System Initiated
Call Logging (SICL) to the service provider's customer service center through
its Automatic Call Handling System.
System Initiated Call Logging (SICL) is
functional only if you also install DSNlink.
Information on DSNlink can be
found at
http://www.support.compaq.com/dsnlink.
5.2 Compaq Analyze (CA)
Compaq Analyze can help minimize down time by providing proactive error notification of faulty hardware components. Faulty components can be automatically put off line using the Automatic Deallocation facility or manually put off line by the administrator to avoid system panics.
Compaq Analyze is a rules-based hardware fault management diagnostic application that provides error event analysis and translation. The multievent correlation analysis feature of Compaq Analyze provides analysis of events stored in the binary system event log or other specified binary log files.
By default, Compaq Analyze provides the analysis used to indict a component and notifies the operating system's component indictment facility. See Section 3.1 for more information on component indictment. Indictment support is only available in Compaq Analyze 4.0 or higher.
Compaq Analyze can be set up at installation for automatic notification of system administrators or service personel. By default, System administrators will be notified by e-mail whenever Compaq Analyze detects a faulty component. Compaq service personel also can be notified with the System Initiated Call Logging (SICL) to the service provider's customer service center through its Automatic Call Handling System. This is configureable in Compaq Analyze and described fully in the Compaq Analyze documentation.
Compaq Analyze also provides an option to run in manual mode. Manually generated analysis by Compaq Analyze does not send automatic notification by indictment, e-mail, or SICL.
When Compaq Analyze is installed, the GUI interface can be
launched directly from the SysMan Station by clicking on the Host Icon
and selecting Compaq Analyze from the Tools menu.
5.3 Compaq Crash Analysis Tool (CCAT)
The Compaq Crash Analysis Tool (CCAT) can help minimize down time by potentially providing a user with ways of recovering from a system crash quickly.
The CCAT tool can be configured to automatically send crash parameters or results files to the Compaq Support Center (CSC). It also can send e-mail notification to system administrators.
This tool collects data that describes system crashes and matches the data against a set of operating system specific rules to determine if the footprint of the collected crash data matches any known crash data footprints for which a solution or corrective action is known. This capability significantly reduces customer down time by shortening the time required to analyze system crashes.
The CCAT graphical user interface (GUI) is an interactive tool used to analyze crash files manually. The CCAT GUI is used only for onsite manual tasks. It does not log calls or send crash parameters or results to the CSC nor does it send e-mail notification to anyone.
When CCAT is installed, the GUI interface can be launched directly from the SysMan Station by clicking on the Host Icon and selecting CCAT from the Tools menu.
For specific installation and user details, see the
Compaq Crash Analysis Tool User
guide at
http://www.support.compaq.com/svctools/webes/webes_docs.html.
5.4 Revision & Configuration Management (RCM)
The Revision and Configuration Management (RCM) tool provides revision and configuration reporting for Compaq AlphaServer systems running Tru64 UNIX. Under normal circumstances the RCM application is used by Compaq Service Engineers and Compaq Support Center specialists to collect revision and configuration data from customer systems.
The types of reports that RCM can create are as follows:
Configuration Report - an inventory of the components on the target system, based on a single data collection.
Change Report - shows the difference between two data collections on the same system.
Comparison Report - shows the differences between data collections on two different systems.
Analysis Report - can generate either of the following analysis reports:
Patch analysis for Tru64 UNIX Version 4.0E, 4.0F, and 4.0G systems
Hardware revision analysis for AlphaServer ES40 systems
After RCM is installed, it can be launched directly from the SysMan Menu by selecting the Support and Services branch and then selecting Configure the RCM Data Collector.
For specific installation and user details, see the
Revision and Configuration Management Data Collector for Compaq Tru64 UNIX
user guide at
http://www.support.compaq.com/svctools/webes/webes_docs.html.
5.5 The sys_check Tool
The
sys_check
tool can help reduce system down time
by as much as 50 percent by providing fast access to critical system data.
It is recommended that you run a full check at least once a week to maintain
the currency of system data.
However, some options will take a long time to
run and can impact system performance.
You should therefore choose your options
carefully and run them during off-peak hours.
As a minimum, perform at least
one full run (all data and warnings) as a postconfiguration task in order
to identify configuration problems and establish a configuration baseline.
The
sys_check
tool is a System Administration application
that creates an HTML file that describes the system configuration, and it
can be used to diagnose serviceability problems.
The
sys_check
tool is a system census and configuration
verification tool that also is used to aid in diagnosing system errors and
problems.
Use the
sys_check
tool to create an HTML report
of your system's configuration (software and hardware).
You can run the
sys_check
tool from the SysMan
GUI applications or from the command line interface.
For further information
on using the
sys_check
tool, see the
System Administration
manual,
System Configuration and Tuning
manual and
sys_check
(8).
The
sys_check
tool also performs an analysis of operating
system parameters and attributes such as those that tune the performance of
the system.
The report generated by the
sys_check
tool
provides warnings if it detects problems with any current settings.
While
the
sys_check
tool can generate hundreds of useful warnings,
it is not a complete and definitive check of the health of your system.
The
sys_check
tool should be used in conjunction with event management
and system monitoring tools to provide a complete overview and control of
system status.
See
EVM
(5)
for more information on event management.
See
the
System Administration
manual for information on monitoring your system.
Running the
sys_check
tool for warning information
on possible configuration problems or for performance data takes less time
than other options and we suggest you do so once per week.
After you perform OLAR operations, you can use the
sys_check
tool to check your system configuration.
You can use the analysis
information to determine if there are potential problems with the operations
you just performed.
The
sys_check
tool creates an HTML
file that describes the system configuration, and aids you in diagnosing system
errors and problems.
The application checks system components such as CPUs
and provides performance data for those system components.
The
sys_check
tool outputs any warnings and tuning guidelines, which you can
use to improve system performance.
5.6 The collect Tool
The
collect
tool is a system monitoring application
that records or displays specific operating system and process data for a
set of subsystems.
You can configure the
collect
tool to
automatically start when the system is rebooted.
The
collect
tool can assist you in diagnosing performance problems and its report output
is requested by your technical support service when they are assisting you
in solving system problems.
See
collect
(8)
and the
System Administration
manual for
more information.
5.7 Service Applications and Monitoring Applications Quick Start
If you are familiar with the service applications that support the operating
system, you can begin using them right away.
Table 5-1
summarizes the applications.
Table 5-1: Service Applications Quick Applications Start
Service Applications | Interface Used | Invoking the Application | Command Line |
Compaq Analyze (CA) | SysMan Station | host icon --> Tools --> Compaq Analyze | /usr/sbin/ca |
Compaq Crash Analysis Tool (CCAT) | SysMan Station | host icon --> Tools --> CCAT | /usr/sbin/ccat gui |
Revision & Configuration Management (RCM) | SysMan Station and SysMan Menu | host icon --> Tools --> RCM | unisetup |
sys_check
tool |
SysMan Station and SysMan Menu | sysman config_report
or
sysman escalation |
sys_check -perf
or
sys_check
-escalate |
collect
tool |
SysMan Station and SysMan Menu | collect |
/usr/sbin/collect |
5.7.1 Recommended Schedule and Use
When you use the service applications for fault diagnosis, the applications can reduce system down time and enhance system serviceability by providing fast access to critical system configuration data. Table 5-2 gives you some recommended guidelines that you can use to maintain the currency of system data. However, note that some applications will take a long time to run and can impact system performance. You therefore should choose your applications carefully and run them during off peak hours. As a minimum, perform at least one full run (all data and warnings) as a postconfiguration task in order to identify configuration problems and establish a configuration baseline.
Table 5-2
provides guidelines for
balancing data needs with performance impact.
Table 5-2: Recommended Schedule and Use
Service Application | Purpose and Use of Application | Frequency of Use |
Compaq Analyze (CA) | Fault analysis and Fault avoidance | Continually running |
Compaq Crash Analysis Tool (CCAT) | Fault analysis | After a system crash |
Revision & Configuration Management (RCM) | Generates system configuration information and analysis | As needed |
sys_check
-perf
-warn |
Generates system configuration information and analysis | Run weekly |
sys_check |
Generates system configuration information and analysis | Run at least once after installation and after major configuration changes |
sys_check
-all, or
-escalate, or
-noquick |
Generates complete system configuration information and analysis | Run only when troubleshooting |
5.8 Crash Dump and Save Core Commands
The
dumpsys
and
savecore
applications
can help diagnose problems after a system crash.
The
savecore
command usually is invoked automatically
during system startup.
It determines whether a crash dump has been made, and
if there is enough file system space to save it.
See
savecore
(8), the
System Administration
manual or the
Kernel Debugging
manual for more information.
The
dumpsys
command copies a snapshot of memory
to a dump file, without halting the system.
This feature is useful for estimating
crash dump size during dump configuration planning.
See
dumpsys
(8), the
System Administration
manual or the
Kernel Debugging
manual for more information.