System monitoring involves the use of basic commands and optional utilities to obtain baselines of operating parameters, such as the CPU workload or I/O throughput. You use these baselines to monitor, record, and compare ongoing system activity and ensure that the system does not deviate too far from your operational requirements.
Monitoring the system also enables you to predict and prevent problems that might make the system or its peripherals unavailable to users. Information from monitoring utilities enables you to react quickly to unexpected events such as system panics and disk crashes so that you can quickly resolve problems and bring the system back online.
The topic of monitoring is closely related to your technical support needs. Some of the utilities described in this chapter have a dual function. Apart from realtime system monitoring, they also collect historical and event-specific data that is used by your technical support representative. This data can be critical in getting your system up and running quickly after a fault in the operating system or hardware. Therefore, it is recommended that you at least follow the monitoring guidelines in Section 11.1.
Testing involves the use of commands and utilities to exercise parts of the system or peripheral devices such as disks. The available test utilities are documented in this chapter. Your system hardware also provides test utilities that you run at the console prompt. Refer to your Owner's guide for information on hardware test commands.
The following topics are covered in this chapter:
Section 11.1 contains basic monitoring guidelines and provides an overview of the utilities. It also provides pointers to related topics.
Section 11.2 describes some of the monitoring utilities in greater detail.
Section 11.3 describes environmental monitoring, which monitors aspects of system hardware status such as the temperature and whether the cooling fan is working. This feature depends on whether the hardware contains sensors that support such monitoring. Not all systems support this feature.
Section 11.4
describes how you use the
system component the test utilities.
Note that your system hardware also provides
test routines.
Refer to the Owner's Manual for more information.
If you need
to obtain detailed information on the characteristics of system devices (such
as disks and tapes) see the
hwmgr
command, documented in
Chapter 5.
11.1 Overview of Monitoring and Testing
This section provides some general guidelines for monitoring your system,
and a brief overview of all the utilities that the operating system provides.
11.1.1 Guidelines for Monitoring Systems
Use the following procedure after you configure your system exactly as required for its intended operation:
Review the overview of monitoring utilities provided
in this section.
Based on the system configuration, select utilities that
meet the requirements of the configuration and your monitoring needs.
For
example, if you have a graphics head terminal and you want to monitor several
distributed systems you might want to set up the SysMan Station.
If you
want to monitor a single local server the
dxsysinfo
window
might adequate.
If applicable, set any attributes that trigger warnings and messages. For example, you might want to set a limit of 85% full on all file systems to prevent loss of data due to a full device.
Note
Many optional subsystems provide their own monitoring utilities. You should familiarize yourself with these interfaces and decide whether they are more appropriate than the generic utilities.
Run the
sys_check
-all
utility to:
Establish a no-load baseline.
Determine whether any system attributes need to be tuned.
If necessary, use the information from
sys_check
to tune system attributes.
Refer to the
System Configuration and Tuning
guide for
information on Tuning your system.
Store the baseline data where it can be
easily accessed later, such as on another system.
You might also want to print
a copy of the report.
sys_check
utility under loadAt an appropriate time, run the
sys_check
utility
when the system is under a reasonable workload.
Choose only those options
that you want to monitor, such as
-perf.
This might have a
small impact on system performance, so you might not want to run it during
peak end-user demand.
Analyze the output from the
sys_check
utility and
perform any additional recommended changes that meet with your operational
requirements.
This might involve further tuning of system attributes or configuration
changes such as the reallocation of system resources using a utility such
as the Class Scheduler.
See
Section 11.2.2
for information
on using the
sys_check
utility.
Configure the event management logging and reporting strategy for the system in conjunction with whatever monitoring strategy you employ. See Chapter 13 and Chapter 12 for information on how to configure EVM.
Set up any other monitoring utilities that you want to use. For example:
Configure the
sys_check
utility to run
regularly during off-peak hours, using the
runsyscheck
script with the
cron
utility as described in
Section 11.2.2.
In the event of a system problem, the regularly-updated report is useful when
analyzing and troubleshooting the problem.
Note
Crash dump data might also be required when diagnosing system problems. See Chapter 14 for information on configuring the crash dump environment.
Install and configure any optional performance utilities, such as the Performance Manager. If supported by the target system, you should also configure environmental monitoring, as described in Section 11.3.
11.1.2 Summary of Commands and Utilities
The operating system provides a number of monitoring commands and utilities. Some commands return a simple snapshot of system data in numerical format, while others have many options for selecting and filtering information. Also provided are complex graphical interfaces that filter and track system data in real time and display it on a graphics head terminal.
Choose monitoring utilities that best fit your local environment and monitoring needs and consider the following:
Using monitoring utilities canimpact system performance.
To help diagnose problems in performance, such as I/O bottlenecks,
a simple command such as
iostat
might be adequate.
To provide a quick visual check of resources on a single-user
system, the X11 System Information interface (dxsysinfo
)
might be adequate.
Some utilities are restricted to the root user while others are accessible by all system users.
For enterprise-wide monitoring, the SysMan Station can display the health of many systems simultaneously on a single screen.
To track assets across an enterprise or verify what options are installed in what systems (and check whether they are functioning correctly), the web-based Insight Manager utility can be used for both UNIX servers and client PC systems.
You might need to provide output from a monitoring utility to your technical support site during problem diagnosis. It will greatly reduce your system downtime if you take a system baseline and establish a routine monitoring and data collection schedule before any problems occur.
The following sections describe the monitoring utilities.
11.1.2.1 Command-Line Utilities
Use the following commands to display a snapshot of various system statistics:
vmstat
The
vmstat
command displays system statistics for virtual
memory, processes, trap, and CPU activity.
An example of
vmstat
output is:
bigrig>
vmstat
Virtual Memory Statistics: (pagesize = 8192) procs memory pages intr cpu r w u act free wire fault cow zero react pin pout in sy cs us sy id 2 97 20 8821 50K 4434 653K 231K 166K 1149 142K 0 76 250 194 1 1 98
Refer
to the
vmstat
(1)
reference page for more information.
iostat
The
iostat
command reports input and output information for terminals and
disks and the percentage of time the CPU has spent performing various operations.
An example of
iostat
output is:
bigrig>
iostat
tty floppy0 dsk0 cpu tin tout bps tps bps tps us ni sy id 0 1 0 0 3 0 0 0 1 98
Refer
to the
iostat
(1)
reference page for more information.
who
The
who
command reports input and output information for terminals and disks and the
percentage of time the CPU has spent performing various operations.
An example
of
who
output is:
bigrig>
who
# who root console Jan 3 09:55 root :0 Jan 3 09:55 root pts/1 Jan 3 09:55 bender pts/2 Jan 3 14:59 root pts/3 Jan 3 15:43
Refer to the
who
(1)
reference page for more information.
See also the
users
(1)
reference
page.
uptime
The
uptime
command reports how long the system has been running.
Refer to
the
uptime
(1)
reference page for more information.
Refer also to the
netstat
command
and the
Network Administration
guide for information on monitoring your network.
11.1.2.2 SysMan Menu Monitoring and Tuning Tasks
The SysMan Menu provides options for several monitoring tasks. Refer to Chapter 1 for general information on using the SysMan Menu. The following options are provided under the Monitoring and Tuning menu item:
This option invokes the EVM event viewer, which is described in Chapter 13.
Invokes the interface that enables you to configure Insight Manager and start the Insight Manager daemon. Refer to Chapter 1 for information on configuring Insight Manager.
This is a SysMan Menu interface to the
vmstat
command,
described previously in this section.
This
is a SysMan Menu interface to the
iostat
command, described
previously in this section.
This is a SysMan Menu
interface to the
uptime
command, described previously in
this section.
In addition, the following options are provided under the Support and Services menu item:
Invokes
the escalation report feature of the
sys_check
utility.
The escalation report is used only in conjunction with diagnostic services,
and will be requested by your technical support organization.
Refer to
Section 11.2.2
for more information on using the escalation
options in
sys_check
.
Invokes
the system configuration report feature of the
sys_check
utility.
Use this option to create a baseline record of your system configuration
and to update the baseline at regular intervals.
Note that using this option
creates a full default report which can take many minutes to complete and
can impact system performance.
Refer to
Section 11.2.2
for more information on using the
sys_check
utility.
The SysMan Station provides a graphical view of one or
more systems and also enables you to launch applications to perform administrative
operations on any component.
Refer to
Chapter 1
for information
on using the SysMan Station.
11.1.2.3 X11-Compliant Graphical Interfaces
The operating system provides System Management folders containing several graphical interfaces that are typically used under the default Common Desktop Environment (CDE) windowing environment. You can invoke these interfaces from the CDE Front Panel by clicking on the Application Manager icon to display the Application Manager folder. From this folder, select the System Admin icon, and then the MonitoringTuning icon. This folder provides icons that invoke the following SysMan Menu items:
This icon invokes a graphical interface to the
system configuration report feature of the
sys_check
utility.
This icon invokes a graphical interface to the escalation report
feature of the
sys_check
utility.
This icon nvokes the interface that enables you to configure Insight Manager and start the Insight Manager daemon.
The remaining applications in this folder relate to system tuning.
Refer
to the
System Configuration and Tuning
guide for information on tuning using the
Process Tuner (a graphical interface to the
nice
command)
and the Kernel Tuner (dxkerneltuner)
.
The Tools folder provides graphical interfaces to the commands such
as
vmstat
.
Invoke these interfaces from the CDE Front Panel
by clicking on the Application Manager icon to display the Application Manager
folder.
From this folder, select the
System
Admin icon, and then the Tools icon.
This folder provides the following interfaces:
This
is a graphical interface to the
iostat
command, described
previously in this section.
This
is a graphical interface to the
netstat
command.
Refer
to the
Network Administration
guide for information on monitoring your network.
This is a graphical interface to the
/var/adm/messages
log file, which is used to store certain system messages according
to the current configuration of system event management.
For information on
events, the messages they generate, and the message log files, refer to
Chapter 12
and
Chapter 13.
This
is a graphical interface to the
vmstat
command, described
previously in this section.
This is a graphical interface to the
who
command, described previously in this section.
The remaining
X11-compliant monitoring application is located in the Application Manager -
DailyAdmin folder.
Click on the System Information (dxsysinfo
)
icon to launch the interface.
This interface provides you with a quick view
of the following system resources and data:
A brief description of the number and type of processors (CPUs).
The UNIX operating system version and the amount of available system memory.
Three dials indicating approximate amount of CPU activity,
in-use memory, and in-use virtual memory (swap).
This information can also
be obtained using commands such as
vmstat
.
Two warning buttons for files and swap. These buttons are filled with color when a file system is nearly full or if the amount of swap space is too low.
The current available space status of all local and remotely-mounted file systems. You can set a percentage limit here to trigger the warning indicators if available space falls below a certain percentage. Refer to Chapter 6 and Chapter 9 for information on increasing the available file system space.
11.1.2.4 Advanced Monitoring Utilities
The following utilities provide options that enable you to view and record many different operating parameters:
The
collect
utility enables you
to sample many different kinds of system and process data simultaneously over
a predetermined sampling time.
You can collect information to data files and
play the files back at the terminal.
The
collect
utility can assist you in diagnosing
performance problems and its output may be requested by your technical support
service when they are assisting you in solving system problems.
Using the
collect
utility is described in
Section 11.2.1.
sys_check
utilityThe
sys_check
utility is a command-line interface that you use to create
a permanent record of the system configuration and the current settings of
many system attributes.
This utility is described in detail in
Section 11.2.2.
The Monitoring Performance History (MPH) utility is a suite of shell scripts that gathers information on the reliability and availability of the operating system and its hardware environment such as crash data files. This utility is described in detail in Section 11.2.3.
Performance Manager is an
SNMP-based, user-extensible, real-time performance monitoring and management
utility.
It enables you to detect and correct performance problems on a single
system (or a cluster).
Performance Manager has a graphical user interface
(GUI), and a limited command-line interface using commands such as the
getone
command to read and display lines of data.
The GUI can be
configured to display tables and graphs, showing many different system parameters
and values, such as CPU performance, physical memory usage, and disk transfers.
Performance Manager comprises two primary components: Performance Manager
GUI (pmgr
) and Performance Manager daemon (pmgrd
).
Additional daemons are used in monitoring TruCluster clusters
(clstrmond
) and the Advanced File System (advsfd
), supplied in the AdvFS Utilities subset.
The Performance Manager software subsets are included on the Associated
Products, Volume 2 CD-ROM.
No license is required to install and use the software.
For an overview of features refer to the release notes.
The PostScript file
is
PMGR***_RELNOTES.ps
and the text file is
PMGR***_RELNOTES.txt
.
The
Performance Manager
guide is provided in the Software
Documentation CD-ROM.
The following topics are closely related to system monitoring and testing:
Refer to Chapter 10 for information on administering the system accounting services, which enables you to monitor and record access to resources such as printers.
Refer to
Chapter 12
for instructions
on configuring and using basic system event logging using the basic
binlogd
and
syslogd
event channels.
This chapter
also describes how you access system log files, where events and errors are
recorded.
Refer to Chapter 13 for information on configuring and using the Event Manager (EVM), which provides sophisticated management of system events, including automated response to certain types of event.
Refer to the Network Administration guide for information on monitoring the system's networking components.
Refer to the System Configuration and Tuning for information tuning your system in response to information gathered during monitoring and testing.
11.2 Configuring and Using Monitoring Utilities
This section introduces some of the monitoring utilities and iscusses their setup and use. Refer to the documentation and reference pages supplied with each application for more information. Refer to Chapter 1 for information on configuring and using the SysMan Station to monitor systems that have a graphics environment.
A closely related topic is event management and error logging.
Refer
to
Chapter 12
and
Chapter 13
for information
on these topics.
11.2.1 Using collect to Record System Data
The
/usr/sbin/collect
command-line utility collects
data that describes the current system status.
It enables you to select from
many parameters and sort them and to time the data collection period.
The
data is displayed in real time or recorded to a file for future analysis or
playback.
Using the
collect
utility has a low CPU overhead
because you can focus on the exact aspects of system behavior that you need
to record and therefore it should not adversely effect system performance.
The output from the unqualified
/usr/sbin/collect
command is similar to the output from monitoring commands such as
vmstat
,
iostat
, or
netstat
.
The command synopsis is fully defined in the
collect
(8)
reference
page.
Important features provided by the
collect
utility
are:
Controlling the duration of, and rate at which data is sampled. Sorting the output according to processor usage.
Extracting a time slice of data from a data record file. For example, if you want to look at certain system parameters during the busiest time of use, you can extract that data from the data file using the -C option.
Specifying a particular device using its device special file name. For example the following command identifies that data is collected from the named devices:
#
collect -sd -Ddsk1,dsk10
Specifying a particular subsystem such as the CPU or the network. For example, the following command specifies that data is collected only for the CPUs, and a sample of data is shown:
#
collect -e cf
CPU SUMMARY USER SYS IDLE WAIT INTR SYSC CS RUNQ AVG5 AVG30 AVG60 FORK VFORK 13 16 71 0 149 492 725 0 0.13 0.05 0.01 0.30 0.00 SINGLE CPU STATISTICS CPU USER SYS IDLE WAIT 0 13 16 71 0
Recording and preserving a series of data files using the -H (history) option. Compressing data files for economical storage.
Specifying specific users, groups, and processes for which data is to be sampled.
Using the
-p, you can specify multiple data
files and use the
collect
utility to play them back as
one stream.
Using the
-f
option you can combine multiple binary
input files into one binary output file.
The
collect
utility locks itself into memory using
the page locking function
plock()
, and cannot be swapped
out by the system.
It also raises its priority using the priority function
nice()
.
If required, page locking can be disabled using the
-ol
command option and the priority setting can be disabled using
the
-on
command option.
However, using
collect
should have minimal impact on a system under high load.
11.2.2 Using the sys_check Utility
The
sys_check
utility provides you with the following:
The ability to establish a baseline of system configuration information, both for software and hardware and record it in an easily accessible HTML report for web browsing. You can update this report regularly or as your system configuration changes.
The opportunity to perform automated checking of many system attributes (such as tuning parameters) and receive feedback on settings that might be more appropriate to the current use of the system.
The
sys_check
utility also checks and reports recommended
maintenance suggestions, such as installing patch kits and maintaining swap
space.
The ability to generate a problem escalation report that can be used by your technical support service to diagnose and correct system problems.
In addition to recording the current hardware and software configuration,
The
sys_check
utility produces an extensive dump of system
performance parameters.
This feature enables you to record many system attribute
values, providing a useful baseline of system data.
Such a baseline is particularly
useful before you undertake major changes or perform troubleshooting procedures.
When you run the
sys_check
it produces an HTML document
on standard output.
Used with the
-escalate
flag, the
script produces
/var/tmp/escalate*
output files by default.
These files can be forwarded to your technical support organization and used
for diagnosing system problems and errors.
Use the following command to obtain a complete list of command options.
#
/usr/sbin/sys_check -h
The output produced by the
sys_check
utility
typically varies between 0.5MB and 3MB in size and it can take from 30 minutes
to an hour to complete the check.
Refer to the
sys_check
(8)
reference
page for more details of the various command options.
You can greatly reduce
the run time by excluding items from the run.
For example, the
sys_check
utility runs
setld
to record the installed software.
Excluding the
setld
operation can greatly reduce the
sys_check
run duration.
You can also invoke standard
sys_check
run tasks
as follows:
Using CDE, open the Application Manager from the CDE front
panel.
Select System_Admin and then MonitoringTuning.
There are icons for
two standard
sys_check
run tasks, Configuration Report
and Escalation Report.
Using the SysMan Menu, expand the Support and Services menu item and choose from the following options:
For information on using the SysMan Menu, refer to Chapter 1.
You can run
sys_check
tasks automatically by enabling
an option in the root
crontabs
file.
In the
/var/spool/cron/crontabs
directory, the
root
file contains a list of default tasks that are run by
cron
on a regular basis.
Remove the comment (#
) command from
the following line:
#0 3 * * 0 /usr/share/sysman/bin/runsyscheck
When this option is enabled the resulting report is referenced by
Insight Manager and can be read from the Insight Manager
Configuration
Report
option.
See
Chapter 1
for information on
using Insight Manager.
11.2.3 Using the Monitoring Performance History Utility
The Monitoring Performance History (MPH) utility is a suite of shell scripts that gathers information on the reliability and availability of the operating system and its hardware environment such as crash data files. The information is automatically copied to your systems vendor by internet mail or DSN link, if available. Using this data, performance analysis reports are created and distributed to development and support groups. This information is only used internally by your systems vendor to improve the design of reliable and highly available systems.
The MPH run process is automatic, requiring no user intervention. Initial configuration requires approximately 10 minutes of your time. MPH will not impact or degrade your system's performance because it runs as a background task, using negligible CPU resource. The disk space required for the collected data and the application is approximately 300 blocks per system. This could be slightly higher in the case of a high number of errors and is considerably larger for the initial run, when a baseline is established (a one-time event).
The MPH utility operates as follows:
Every 10 minutes it records a timestamp indicating that the system is running.
Daily at 2:00am, it extracts any new events records from the
default event log
/var/adm/binary.errlog
.
Every day at 3:00am it transfers the event and timestamp data
and any new
crashdc
data files in
/var/adm/crash
to the system vendor.
The average transfer is 150 blocks of data.
For more information, see http://availability.ayo.dec.com/cars or contact mph@Compaq.Com if you have specific questions.
Before running MPH, review the following information:
The Standard Programmer Commands (Software Development) OSFPGMR400
subset must be installed.
Use the
setld -i
command to verify
that the subset is installed.
The MPH software kit is contained in the mandatory base software
subset OSFHWBASE400.
This subset is installed automatically
during the operating system installation.
Full documentation is located in
/usr/field/mph/unix_installation_guide.ps
.
A text file is also supplied.
The disk space requirement for the MPH software subset is approximately 100 blocks.
To configure MPH on your system, you should be the root user and principal administrator of the target system. You need to supply your name, telephone number, and e-mail address. Complete the following steps:
Find the serial number (SN) of the target system, which is generally located on the rear of the system box. You need this number to complete the installation script.
Enter the following command to run the MPH script:
#
/usr/field/mph/MPH_UNIX***.CSH
Where
***
is the version number,
such as 025.
Enter the information requested by the script. When the script is complete, MPH starts automatically.
If the operating
system needs to be shut down for any reason, an orderly shutdown process must
be followed.
Otherwise, you will have to restart the MPH script as described
in the MPH documentation.
See the
mph
(1)
reference page for more information.
11.3 Environmental Monitoring
On any system, thermal levels can increase because of poor ventilation, overheating conditions, or fan failure. Without detection, an unscheduled shutdown could ensue, causing the system's loss of data or damage to the system itself. By using Environmental Monitoring, the thermal state of AlphaServer systems can be detected and users can be alerted in time enough to recover or perform an orderly shutdown of the system.
The Environmental Monitoring framework consists of four components:
The loadable kernel module and its associated APIs.
The Server System MIB subagent daemon.
The
envmond
daemon.
The
envconfig
utility.
tThese components are described in the following sections.
11.3.1 Loadable Kernel Module
The loadable kernel module and its associated APIs contain the parameters needed to monitor and return status on your system's threshold levels. The kernel module exports server management attributes as described in Section 11.3.1.1 through the kernel configuration manager (CFG) interface only. It works across all platforms that support server management, and provides compatibility for other server management systems under development.
The loadable kernel module does not include platform-specific code (such as the location of status registers). It is transparent to the kernel module which options are supported by a platform. That is, the kernel module and platform are designed to return valid data if an option is supported, a fixed constant for unsupported options, or null.
11.3.1.1 Specifying Loadable Kernel Attributes
The loadable kernel module exports the parameters listed in Table 11-1 to the kernel configuration manager (CFG).
Table 11-1: Parameters Defined in the Kernel Module
Parameter | Purpose |
env_current_temp |
Specifies the current temperature of the system. If a system is configured with the KCRCM module, the temperature returned is in Celsius. If a system does not support temperature readings and a temperature threshold has not been exceeded, a value of -1 is returned. If a system does not support temperature readings and a temperature threshold is exceeded, a value of -2 is returned. |
env_high_temp_thresh |
Provides a system-specific operating temperature threshold. The value returned is a hardcoded, platform-specific temperature in Celsius. |
env_fan_status |
Specifies a noncritical fan status. The value returned is a bit value of zero (0). This value will differ when the hardware support is provided for this feature. |
env_ps_status |
Provides the status of the redundant power supply. On platforms that provide interrupts for redundant power supply failures, the corresponding error status bits are read to determine the return value. A value of 1 is returned on error; otherwise, a value of zero (0) is returned. |
env_supported |
Indicates whether or not the platform supports server management and environmental monitoring. |
11.3.1.2 Obtaining Platform-Specific Functions
The loadable kernel module must return environmental status based on
the platform being queried.
To obtain environmental status, the
get_info()
function is used.
Calls to the
get_info()
function are filtered through the
platform_callsw[]
table.
The
get_info()
function obtains dynamic environmental
data using the function types described in
Table 11-2.
Table 11-2:
get_info()
Function Types
Function Type | Use of Function |
GET_SYS_TEMP |
Reads the system's internal temperature on platforms that have a KCRCM module configured. |
GET_FAN_STATUS |
Reads fan status from error registers. |
GET_PS_STATUS |
Reads redundant power supply status from error registers. |
The
get_info()
function obtains static data using
the
HIGH_TEMP_THRESH
function type, which reads the platform-specific
upper threshold operational temperature.
11.3.1.3 Server System MIB Subagent
The Server System MIB Agent, (which is an eSNMP subagent) is used to export a subset of the Environmental Monitoring parameters specified in the Server System MIB. The Server System MIB exports a common set of hardware-specific parameters across all server platforms, depending on the operating system installed.
Table 11-3 maps the subset of Server System MIB variables that support Environmental Monitoring to the kernel parameters described in Section 11.3.1.1.
Table 11-3: Mapping of Server Subsystem Variables
Server System MIB Variable Name | Kernel Module Parameter |
svrThSensorReading |
env_current_temp |
svrThSensorStatus |
env_current_temp |
svrThSensorHighThresh |
env_high_temp_thresh |
svrPowerSupplyStatus |
env_ps_temp |
svrFanStatus |
env_fan_status |
An SNMP MIB compiler and other utilities are used to compile the
MIB description into code for a skeletal subagent daemon.
Communication between
the subagent daemon and the master agent eSNMP daemon,
snmpd
,
is handled by interfaces in the eSNMP shared library (libesnmp.so
).
The subagent daemon must be started when the system boots and
after the eSNMP daemon has started.
For each Server System MIB variable listed in
Table 11-3,
code is provided in the subagent daemon, which accesses the appropriate parameter
from the kernel module through the CFG interface.
11.3.2 Monitoring Environmental Thresholds
To monitor the system environment, the
envmond
daemon
is used.
You can customize the daemon by using the
envconfig
utility.
The following sections discuss the daemon and utility.
For more
information, see the
envmond
and
envconfig
reference pages.
11.3.2.1 Environmental Monitoring Daemon
By using the Environmental Monitoring daemon,
envmond
,
threshold levels can be checked and corrective action can ensue before damage
occurs to your system.
Then the
envmond
daemon performs
the following tasks:
When the cooling fan on an AlphaServer 1000A fails, the kernel logs the error, synchronizes the disks, then powers down the system. On all other fan failures, a hard shutdown ensues.
Notifies users when a high temperature threshold condition has been resolved.
Notifies all users that an orderly shutdown is in progress if recovery is not possible.
To query the system, the
envmond
daemon uses the
base operating system command
/usr/sbin/snmp_request
to obtain the current values
of the environment variables specified in the Server System MIB.
To enable Environmental Monitoring, the
envmond
daemon must be started during the system boot, but after the eSNMP
and Server System MIB agents have been started.
You can customize the
envmond
daemon using the
envconfig
utility.
11.3.2.2 Customizing the envmond Daemon
You can use the
envconfig
utility to customize how
the environment is queried by the
envmond
daemon.
These
customizations are stored in the
/etc/rc.config
file, which is read by the
envmond
daemon during startup.
Use the
envconfig
utility to perform
the following tasks:
Turn environmental monitoring on or off during the system boot.
Specify the frequency between queries of the system by the
envmond
daemon.
Set the highest threshold level that can be encountered before
a temperature event is signaled by the
envmond
daemon.
Specify the path of a user-defined script that you want the
envmond
daemon to execute when a high threshold level is encountered.
Specify the grace period allotted to save data if a shutdown message has been broadcasted.
Display the values of the Environmental Monitoring variables.
11.3.3 User-Definable Messages
Messages broadcasted or logged by the Environmental Monitoring utility can be modified. The messages are located in the following file:
/usr/share/sysman/envmon/EnvMon_UserDefinable_Msg.tcl
You
must be root to edit this file and you can edit any message text included
in braces ({}).
The instructions for editing each section of the file are
included in the comment fields, preceded by the
#
symbol.
For example, the following message provides samples of possible causes for the high temperature condition:
set EnvMon_Ovstr(ENVMON_SHUTDOWN_1_MSG){System has reached a \ high temperature condition. Possible problem source: Clogged \ air filter or high ambient room temperature.}
You could modify this message text as follows:
set EnvMon_Ovstr(ENVMON_SHUTDOWN_1_MSG) {System \
has reached a high temperature condition. Check the air \
conditioning unit}
Note that you should not alter
any data in this file other that the text strings between the braces ({}).
11.4 Using System Exercisers
The operating system provides a set of exercisers that you can use to troubleshoot your system. The exercisers test specific areas of your system, such as file systems or system memory. The following sections provides information on the system exercisers:
Running the system exercisers (Section 11.4.1)
Using exerciser diagnostics (Section 11.4.2)
Exercising file systems by using the
fsx
command (Section 11.4.3)
Exercising system memory by using the
memx
command (Section 11.4.4)
Exercising shared memory by using the
shmx
command (Section 11.4.5)
Exercising disk drives by using the
diskx
command (Section 11.4.6)
Exercising tape drives by using the
tapex
command (Section 11.4.7)
Exercising communications systems by using the
cmx
command (Section 11.4.8)
In addition to the exercisers documented
in this chapter, your system might also support the DEC Verifier and Exerciser
Tool (VET), which provides a similar set of exercisers.
Refer to the documentation
that came with your latest firmware CD-ROM for information on VET.
11.4.1 Running System Exercisers
To run a system exerciser, you must be logged in
as superuser and
/usr/field
must be your current directory.
The commands that invoke the system exercisers provide an option for specifying a file where diagnostic output is saved when the exerciser completes its task.
Most of the exerciser commands have an online help option that displays
a description of how to use that exerciser.
To access online help, use the
-h
option with a command.
For example, to access help for
the
diskx
exerciser, use the following command:
#
diskx -h
You can run the exercisers in the foreground or the background and can cancel them at any time by pressing [Ctrl/c] in the foreground. You can run more than one exerciser at the same time; keep in mind, however, that the more processes you have running, the slower the system performs. Thus, before exercising the system extensively, make sure that no other users are on the system.
There are some restrictions when you run a system exerciser over an
NFS link or on a diskless system.
For exercisers such as
fsx
that need to write to a file system, the target file system must be writable
by root.
Also, the directory from which an exerciser is executed must be
writable by root because temporary files are written to the directory.
These restrictions can be difficult to adhere to because NFS file systems
are often mounted in a way that prevents root from writing to them.
You can
overcome some of these problems by copying the exerciser into another directory
and running it from the new directory.
11.4.2 Using Exerciser Diagnostics
When an exerciser is halted (either by pressing [Ctrl/c] or by timing out), diagnostics are displayed and are stored in the exerciser's most recent log file. The diagnostics inform you of the test results.
Each time an exerciser is invoked, a new log file is created in the
/usr/field
directory.
For example, when you execute the
fsx
command for the first time, a log file named
#LOG_FSX_01
is created.
The log files contain records of each exerciser's results
and consist of the starting and stopping times, and error and statistical
information.
The starting and stopping times are also logged into the default
system error log file,
/var/adm/binary.errlog
.
This file
also contains information on errors reported by the device drivers or by the
system.
The log files provide a record of the diagnostics. However, after reading a log file, you should delete it because an exerciser can have only nine log files. If you attempt to run an exerciser that has accumulated nine log files, the exerciser tells you to remove some of the old log files so that it can create a new one.
If an exerciser finds errors, you can determine which device or area
of the system has the difficulty by looking at the
/var/adm/binary.errlog
file, using either the
dia
command (preferred)
or the
uerf
command.
For information on the error logger,
see the
Section 12.1.
For the meanings of the error numbers and signal numbers,
see the
intro
(2)
and
sigvec
(2)
reference pages.
11.4.3 Exercising a File System
Use the
fsx
command to exercise the
local file systems.
The
fsx
command exercises the specified
local file system by initiating multiple processes, each of which creates,
writes, closes, opens, reads, validates, and unlinks a test file of random
data.
For more information, see the
fsx
(8)
reference page.
Note
Do not test NFS file systems with the
fsx
command.
The
fsx
command has the following syntax:
fsx
[-fpath
]
[-h
]
[-ofile
]
[-pnum
]
[-tmin
]
You can specify one or more of the following options:
-fpath
Specifies the pathname of the file system directory you want to test.
For example,
-f/usr
or
-f/mnt
.
The default is
/usr/field
.
-h
Displays the command's help message.
-ofile
Saves the output diagnostics in file.
-pnum
Specifies the number of
fsxr
processes you want
fsx
to initiate.
The maximum number of processes is 250.
The default
is 20.
-tmin
Specifies how many minutes you want the
fsx
command
to exercise the file system.
If you do not specify the
-t
option, the
fsx
command runs until you terminate it by
pressing
[Ctrl/c]
in the foreground.
The following example of the
fsx
command tests the
/usr
file system with five
fsxr
processes running
for 60 minutes in the background:
#
fsx -p5 -f/usr -t60 &
11.4.4 Exercising System Memory
Use the
memx
command to exercise the system memory.
The
memx
command exercises the system memory by initiating
multiple processes.
By default, the size of each process is defined as the
total system memory in bytes divided by 20.
The minimum allowable number
of bytes per process is 4095.
The
memx
command runs 1s
and 0s, 0s and 1s, and random data patterns in the allocated memory being
tested.
The files that you need to run the
memx
exerciser
include the following:
memx
memxr
For more information, see the
memx
(8)
reference page
The
memx
command is restricted
by the amount of available swap space.
The size of the swap space and the
available internal memory determine how many processes can run simultaneously
on your system.
For example, if there are 16 MB of swap space and 16 MB of
memory, all of the swap space will be used if all 20 initiated processes (the
default) run simultaneously.
This would prevent execution of other process.
Therefore, on systems with large amounts of memory and small amounts of swap
space, you must use the
-p
or
-m
option, or
both, to restrict the number of
memx
processes or to restrict
the size of the memory being tested.
The
memx
command has the following syntax:
memx
-s
[-h
]
[-msize
]
[-ofile
]
[-pnum
]
[-tmin
]
You can specify one or more of the following options:
Disables the automatic invocation
of the shared memory exerciser,
shmx
.
Displays the command's help message.
Specifies the amount of memory in bytes for each process you want to test. The default is the total amount of memory divided by 20, with a minimum size of 4095 bytes.
Saves the output diagnostics in file.
Specifies the number of
memxr
processes to initiate.
The
maximum number is 20, which is also the default.
Specifies how many minutes you want the
memx
command to
exercise the memory.
If you do not specify the
-t
option,
the
memx
command runs until you terminate it by pressing
[Ctrl/c]
in the foreground.
The following example of the
memx
command initiates
five
memxr
processes that test 4095 bytes of memory and
runs in the background for 60 minutes:
#
memx -m4095 -p5 -t60 &
11.4.5 Exercising Shared Memory
Use the
shmx
command to exercise the shared memory segments.
The
shmx
command spawns a background process called
shmxb
.
The
shmx
command writes and reads the
shmxb
data
in the segments, and the
shmxb
process writes and reads
the
shmx
data in the segments.
Using
shmx
, you can test the number and the size
of memory segments and
shmxb
processes.
The
shmx
exerciser runs until the process is killed or until the time
specified by the
-t
option is exhausted.
You automatically invoke the
shmx
exerciser when
you start the
memx
exerciser, unless you specify the
memx
command with the
-s
option.
You
can also invoke the
shmx
exerciser manually.
The
shmx
command has the following syntax:
/usr/field/shmx
[-h
]
[-ofile
]
[-v
]
[-ttime
]
[-msize
]
[-sn
]
The
shmx
command options are as follows:
-h
Prints the command's help message.
-ofile
Saves diagnostic output in file.
-v
Uses the
fork
system call instead of the
vfork
system
call to spawn the
shmxb
process.
-ttime
Specifies time as the run time in minutes. The default is to run until the process is killed.
-msize
Specifies
size
as the memory segment size,
in bytes, to be tested by the processes.
The
size
value must be greater than zero.
The default is the value of the SHMMAX and
SHMSEG system parameters, which are set in the
/sys/include/sys/param.h
file.
-sn
Specifies n as the number of memory segments. The default (and maximum) number of segments is 3.
The following example tests the default number of memory segments, each with a default segment size:
#
shmx &
The following example runs three memory segments of 100,000 bytes for 180 minutes:
#
shmx -t180 -m100000 -s3 &
11.4.6 Exercising a Disk Drive
Use the
diskx
command to exercise
the disk drives.
The main areas that are tested include the following:
Reads, writes, and seeks
Performance
Disktab entry verification
Caution
Some of the tests involve writing to the disk; for this reason, use the exerciser cautiously on disks that contain useful data that the exerciser could overwrite. Tests that write to the disk first check for the existence of file systems on the test partitions and partitions that overlap the test partitions. If a file system is found on these partitions, you are prompted to determine if testing should continue.
You can use the
diskx
command options to specify
the tests that you want performed and to specify the parameters for the tests.
For more information, see the
diskx
(8)
reference page.
The
diskx
command has the following syntax:
diskx
[options
]
[parameters
]
-f devname
The
-f
devname
option specifies
the device special file on which to perform testing.
The
devname
variable specifies the name of the block or character special
file that represents the disk to be tested, such as
/dev/disk/dsk1h
.
The last character of the file name can specify the disk partition
to test.
If a partition is not specified, all partitions are tested.
For example,
if the
devname
variable is
/dev/disk/dsk0
, all partitions are tested.
If the
devname
variable is
/dev/disk/dsk0a
, the
a
partition
is tested.
This parameter must be specified and can be used with all test
options.
The following options specify the tests to be run on disk:
-d
Tests the disk's
disktab
file entry.
The
disktab
entry is obtained
by using the
getdiskbyname
library routine.
This test
only works if the specified disk is a character special file.
See the
disktab
(4)
reference page for more information.
-h
Displays a help message describing test options and parameters.
-p
Specifies a performance
test.
Read and write transfers are timed to measure device throughput.
Data
validation is not performed as part of this test.
Testing uses a range of
transfer sizes if the
-F
option is not specified.
The range of transfer sizes is divided by the number specified with
the
perf_splits
parameter to obtain a transfer size increment.
For example, if the
perf_splits
parameter is set to 10,
tests are run starting with the minimum transfer size and increasing the transfer
size by 1/10th of the range of values for each test repetition.
The last transfer
size is set to the specified maximum transfer size.
If you do not specify a number of transfers, the transfer count is set to allow the entire partition to be read or written. In this case, the transfer count varies, depending on the transfer size and the partition size.
The performance test runs until completed or until interrupted; the
time is not limited by the
-minutes
parameter.
This
test can take a long time to complete, depending on the test parameters.
To achieve maximum throughput, specify the
-S
option to cause sequential transfers.
If the
-S
option is not specified, transfers are done to random locations.
This may
slow down the observed throughput because of associated head seeks on the
device.
-r
Specifies a read-only
test.
This test reads from the specified partitions.
Specify the
-n
option to run this test on the block special file.
This test is useful for generating system I/O activity. Because it is a read-only test, you can run more than one instance of the exerciser on the same disk.
-w
Specifies a write test. This test verifies that data can be written to the disk and can be read back to verify the data. Seeks are also done as part of this test. This test provides the most comprehensive coverage of disk transfer functions because it uses reads, writes, and seeks. This test also combines sequential and random access patterns.
This test performs the following operations using a range of transfer
sizes; a single transfer size is used if the
-F
option
is specified:
Sequentially writes the entire test partition, unless the
number of transfers has been specified using the
-num_xfer
parameter
Sequentially reads the test partition
The data read from the disk is examined to verify it.
Then,
if random transfer testing has not been disabled (using the
-S
option), writes are issued to random locations on the partition.
After the random writes are completed, reads are issued to random locations
on the partition.
The data read from random locations is examined to verify
it.
The following options modify the behavior of the test:
-F
Performs fixed size
transfers.
If this option is not specified, transfers are done using random
sizes.
This option can be used with the
-p
,
-r
, and
-w
test options.
-i
Specifies interactive mode. In this mode, you are prompted for various test parameters. Typical parameters include the transfer size and the number of transfers. The following scaling factors are allowed:
k or K (for kilobyte (1024 * n))
b or B (block (512 * n))
m or M (megabyte (1024 * 1024 * n))
For example 10 K would specify 10,240 bytes.
-Q
Suppresses performance
analysis of read transfers.
This option only performs write performance testing.
To perform only read testing and to skip the write performance tests, specify
the
-R
option.
The
-Q
option
can be used with the
-p
test option.
-R
Opens the disk in read-only mode. This option can be used with all test options.
-S
Performs transfers
to sequential disk locations.
If this option is not specified, transfers
are done to random disk locations.
This option can be used with the
-p
,
-r
, and
-w
test options.
-T
Directs output to
the terminal.
This option is useful if output is directed to a log file by
using the
-o
option.
If you specify the
-T
option after the
-o
option, output
is directed to both the terminal and the log file.
The
-T
option can be used with all test options.
-Y
Does not prompt you to confirm that you want to continue the test if file systems are found when the disk is examined; testing proceeds.
In addition to the options, you can also specify test parameters.
You
can specify test parameters on the
diskx
command line or
interactively with the
-i
option.
If you do not
specify test parameters, default values are used.
To use a parameter, specify the parameter name, a space, and the numeric value. For example, you could specify the following parameter:
-perf_min 512
You can use the following scaling factors:
k or K (for kilobyte (1024 * n))
b or B (for block (512 * n))
m or M (for megabyte (1024 * 1024 * n))
For example,
-perf_min 10K
causes transfers
to be done in sizes of 10,240 bytes.
You can specify one or more of the following parameters:
-debug
Specifies the level of diagnostic output to be produced. The greater the number specified, the more output is produced describing the exerciser operations. This parameter can be used with all test options.
-err_lines
Specifies the maximum number of error messages that are produced as a result of an individual test. A limit on error output prevents a large number of diagnostic messages if persistent errors occur. This parameter can be used with all test options.
-minutes
Specifies the
number of minutes to test.
This parameter can be used with the
-r
and
-w
test options.
-max_xfer
Specifies
the maximum transfer size to be performed.
If transfers are done using random
sizes, the sizes are within the range specified by the
-max_xfer
and
-min_xfer
parameters.
If fixed size
transfers are specified (see the
-F
option), transfers
are done in a size specified by the
-min_xfer
parameter.
Specify transfer sizes to the character special file in multiples of
512 bytes.
If the specified transfer size is not an even multiple, the value
is rounded down to the nearest 512 bytes.
This parameter can be used with
the
-r
and
-w
test options.
-min_xfer
Specifies the
minimum transfer size to be performed.
This parameter can be used with the
-r
and
-w
test options.
-num_xfer
Specifies the
number of transfers to perform before changing the partition that is currently
being tested.
This parameter is only useful if more than one partition is
being tested.
If this parameter is not specified, the number of transfers
is set to a number that completely covers a partition.
This parameter can
be used with the
-r
and
-w
test options.
-ofilename
Sends output to the specified file name. The default is to display output on the terminal screen. This parameter can be used with all test options.
-perf_max
Specifies the
maximum transfer size to be performed.
If transfers are done using random
sizes, the sizes are within the range specified by the
-perf_min
and
-perf_max
parameters.
If fixed size
transfers are specified (see the
-F
option), transfers
are done in a size specified by the
-perf_min
parameter.
This parameter can be used with the
-p
test option.
-perf_min
Specifies the
minimum transfer size to be performed.
This parameter can be used with the
-p
test option.
-perf_splits
Specifies
how the transfer size will change if you test a range of transfer sizes.
The range of transfer sizes is divided by the number specified with the
-perf_splits
parameter to obtain a transfer size increment.
For example, if the
-perf_splits
parameter is set
to 10, tests are run starting with the minimum transfer size and increasing
the transfer size by 1/10th of the range of values for each test repetition.
The last transfer size is set to the specified maximum transfer size.
This
parameter can be used with the
-p
test option.
-perf_xfers
Specifies
the number of transfers to be performed in performance analysis.
If this
value is not specified, the number of transfers is set equal to the number
that is required to read the entire partition.
This parameter can be used
with the
-p
test option.
The following example performs read-only testing on the character device
special file that
/dev/rdisk/dsk0
represents.
Because
a partition is not specified, the test reads from all partitions.
The default
range of transfer sizes is used.
Output from the exerciser program is displayed
on the terminal screen:
#
diskx -f /dev/rdisk/dsk0 -r
The following example runs on the
a
partition of
/dev/disk/dsk0
, and program output is logged to the
diskx.out
file.
The program output level is set to 10 and causes additional
output to be generated:
#
diskx -f /dev/disk/dsk0a -o diskx.out -d -debug 10
The following example shows that performance tests are run on the
a
partition of
/dev/disk/dsk0
, and program output
is logged to the
diskx.out
file.
The
-S
option causes sequential transfers for the best test results.
Testing
is done over the default range of transfer sizes:
#
diskx -f /dev/disk/dsk0 -o diskx.out -p -S
The following command runs the read test on all partitions of the specified disks. The disk exerciser is invoked as three separate processes, which generate extensive system I/O activity. The command shown in this example can be used to test system stress:
#
diskx -f /dev/rdisk/dsk0 -r &; diskx -f /dev/rdisk/dsk1 -r &; diskx -f /dev/rdisk/dsk2 -r &
11.4.7 Exercising a Tape Drive
Use
the
tapex
command to exercise a tape drive.
The
tapex
command writes, reads, and validates random data on a tape
device from the beginning-of-tape (BOT) to the end-of-tape (EOT).
The
tapex
command also performs positioning tests for records and files,
and tape transportability tests.
For more information, refer to the
tapex
(8)
reference page.
Some
tapex
options perform specific tests (for example,
an end-of-media (EOM) test).
Other options modify the tests, for example,
by enabling caching.
The
tapex
command has the following syntax:
tapex
[options
]
[parameters
]
You can specify one or more of the options described in
Table 11-4.
In addition to options, you can also specify test parameters.
You specify
parameters on the
tapex
command line or interactively with
the
-i
option.
If you do not specify test parameters,
default values are used.
To use a test parameter, specify the parameter name, a space, and the number value. For example, you could specify the following parameter:
-min_rs 512
You can use the following scaling factors:
k or K (for kilobyte (1024 * n))
b or B (for block (512 * n))
m or M (for megabyte (1024 * 1024 * n))
For example, 10 K would specify 10,240 bytes.
The following parameters can be used with all tests:
-err_lines
Specifies the error printout limit.
-fixed
bs
Specifies a fixed block device. Record sizes for most devices default to multiples of the blocking factor of the fixed block device as specified by the bs argument.
The following parameters can be used with the
-a
option, which measures performance:
-perf_num
Specifies the number of records to write and read.
-perf_rs
Specifies the size of records.
Other parameters are restricted for use with specific
tapex
options.
Option-specific parameters are documented in
Table 11-4.
Table 11-4: The tapex Options and Option Parameters
tapex Flag | Flag and Parameter Descriptions |
-a |
Specifies the performance measurement
test, which calculates the tape transfer bandwidth for writes and reads to
the tape by timing data transfers.
The following parameters can be used with
the
|
-b |
Causes the write/read tests to run continuously
until the process is killed.
This flag can be used with the
-r
and
-g
flags. |
-c |
Enables caching on the device, if supported. This flag does not specifically test caching; it enables the use of caching on a tape device while other tests are running. |
-C |
Disables caching on TMSCP tape devices. If the tape device is a TMSCP unit, then caching is the default mode of test operation. This flag causes the tests to run in noncaching mode. |
-d |
Tests the ability to append records
to the media.
First, the test writes records to the tape.
Then, it repositions
itself back one record and appends additional records.
Finally, the test does
a read verification.
This test simulates the behavior of the
|
-e |
Specifies EOM test. First, this test writes data to fill a tape; this action can take some time for long tapes. It then performs reads and writes past the EOM; these actions should fail. Finally, it enables writing past the EOM, writes to the tape, and reads back the records for validation purposes. The following parameters
can be used with the
|
-E |
Runs an extensive series of tests in sequential order. Depending on tape type and CPU type, this series of tests can take up to 10 hours to complete. |
-f device |
Specifies the name of the device special
file that corresponds to the tape unit being tested.
Refer to
Chapter 6
for information on device names.
/dev/tape/tape0_d0
is
the default device. |
-F |
Specifies the file-positioning tests.
First, files are written to the tape and verified.
Next, every other file
on the tape is read.
Then, the previously unread files are read by traversing
the tape backwards.
Finally, random numbers are generated, the tape is positioned
to those locations, and the data is verified.
Each file uses a different record
size.
The following parameters can be used with the
|
-G |
Specifies the file-positioning tests on a
tape containing data.
This flag can be used with the
-F
flag to run the file position tests on a tape that has been written to by
a previous invocation of the
-F
test.
To perform
this test, you must use the same test parameters (for example, record size
and number of files) that you used when you invoked the
-F
test to write to the tape.
No other data should have been written
to the tape since the previous
-F
test. |
-g |
Specifies random record size tests.
This test writes records of random sizes.
It reads in the tape, specifying
a large read size; however, only the amount of data in the randomly sized
record should be returned.
This test only checks return values; it does not
validate record contents.
The following parameter is used with the
|
-h |
Displays a help message describing the tape exerciser. |
-i |
Specifies interactive mode. In this mode, you are prompted for various test parameters. Typical parameters include the record size and the number of records to write. The following scaling factors are allowed:
For example, 10 K would specify 10,240 bytes. |
-j |
Specifies the write phase of the tape-transportability
tests.
This test writes a number of files to the tape and then verifies the
tape.
After the tape has been successfully verified, it is brought off line,
moved to another tape unit, and read in with the
|
-k |
Specifies the read phase of the tape-transportability
tests.
This test reads a tape that was written by the
-j
test and verifies that the expected data is read from the tape.
This test
proves that you can write to a tape on one drive and read from a tape on another
drive.
As stated in the description of the
-j
flag,
any parameters specified with the
-j
flag must be
specified with the
-k
flag.
(See the description
of the
-j
flag for information on the parameters
that apply to the
-j
and
-k
flags.) |
-L |
Specifies the media loader test.
For sequential
stack loaders, the media is loaded, written to, and verified.
Then, the media
is unloaded, and the test is run on the next piece of media.
This verifies
that all of the media in the input deck can be written to.
To run this test
in read-only mode, also specify the
-w
flag. |
-l |
Specifies the EOF test. This test verifies that a zero byte count is returned when a tape mark is read and that an additional read fetches the first record of the next tape file. |
-m |
Displays tape contents. This is not a test. This flag reads the tape sequentially and prints out the number of files on the tape, the number of records in each file, and the size of the records within the file. The contents of the tape records are not examined. |
-o filename |
Sends output to the specified file name. The default sends output to the terminal screen. |
-p |
Runs both the record-positioning and file-positioning
tests.
For more information, refer to descriptions of the
-R
and
-F
flags. |
-q |
Specifies the command timeout test. This test verifies that the driver allows enough time for completion of long operations. This test writes files to fill the tape. It then performs a rewind, followed by a forward skip to the last file. This test is successful if the forward skip operation is completed without error. |
-r |
Specifies the record size test.
A number
of records are written to the tape and then verified.
This process is repeated
over a range of record sizes.
The following parameters can be used with the
|
-R |
Specifies the record-positioning test.
First, records are written to the tape and verified.
Next, every other record
on the tape is read.
Then, the other records are read by traversing the tape
backwards.
Finally, random numbers are generated; the tape is positioned
to those locations, and the data is verified.
The following parameters can
be used with the
|
-s |
Specifies the record size behavior
test.
Verifies that a record that is read returns one record (at most) or
the read size, whichever is less.
The following parameters can be used with
the
|
-S |
Specifies single record size test.
This test modifies the record size test (the
|
-T |
Displays output to the terminal screen.
This flag is useful if you want to log output to a file with the
-o
flag and also have the output displayed on your terminal
screen.
This flag must be specified after the
-o
flag in the command line. |
-v |
Specifies verbose mode. This flag causes detailed information to be output. For example, it lists the operations the exerciser is performing (such as record counts), and detailed error information. Information provided by this flag can be useful for debugging purposes. |
-V |
Specifies enhanced verbose mode.
This flag
causes output of more detailed information than the
-v
flag.
The additional output consists of status information on exerciser operations.
Information provided by this flag can be useful for debugging purposes. |
-w |
Opens the tape as read-only.
This mode is
useful only for tests that do not write to the media.
For example, it allows
the
-m
test to be run on a write-protected media. |
-Z |
Initializes the read buffer to the nonzero
value 0130.
This can be useful for debugging purposes.
If the
-Z
flag is not specified, all elements of the read buffer are initialized
to zero.
Many of the tests first initialize their read buffer and then perform
the read operation.
After reading a record from the tape, some tests validate
that the unused portions of the read buffer remain at the value to which they
were initialized.
For debugging purposes, you can set this initialized value
to a number other than zero.
In this case, you can use the arbitrary value
0130. |
The following example runs an extensive series of tests on tape device
/dev/tape/tape0_d0
and sends all output to the
tapex.out
file:
#
tapex -f /dev/tape/tape0_d0 -E -o tapex.out
The following example performs random record size tests and outputs
information in verbose mode.
This test runs on the default tape device
/dev/tape/tape0_d0
, and the output is sent to the terminal screen.
#
tapex -g -v
The following example performs read and write record testing using record
sizes in the range 10 K to 20 K.
This test runs on the default tape device
/dev/tape/tape0_d0
, and the output is sent to the terminal screen.
#
tapex -r -min_rs 10k -max_rs 20k
The following example performs a series of tests on tape device
/dev/tape/tape0_d0
, which is treated as fixed block device in which
record sizes for tests are multiples of the blocking factor 512 KB.
The
append-to-media test is not performed.
#
tapex -f /dev/tape/tape0_d0 -fixed 512 -no_overwrite
11.4.8 Exercising the Terminal Communication System
Use the
cmx
command to exercise
the terminal communications system.
The
cmx
command writes,
reads, and validates random data and packet lengths on the specified communications
lines.
The lines you exercise must have a loopback connector attached to the
distribution panel or the cable.
Also, the line must be disabled in the
/etc/inittab
file and in a nonmodem line; that is, the
CLOCAL
option must be set to on.
Otherwise, the
cmx
command repeatedly displays error messages on the terminal screen until its
time expires or until you press
[Ctrl/c].
For more information,
refer to the
cmx
(8)
reference page.
You cannot test pseudodevice lines or
lta
device
lines.
Pseudodevices have
p
,
q
,
r
,
s
,
t
,
u
,
v
,
w
,
x
,
y
,
or
z
as the first character after
tty
,
for example,
ttyp3
.
The
cmx
command has the following syntax:
/usr/field/cmx
[-h
]
[-o file
]
[-t min
]
[-l line
]
The
cmx
command options are as follows:
-h
Prints the command's help message.
-o
fileSaves output diagnostics in file.
-t
minSpecifies how many minutes you want the
cmx
command
to exercise the communications system.
If you do not specify the
-t
option, the
cmx
command runs until you
terminate it by pressing
[Ctrl/c]
in the
foreground.
-l
line
Specifies the line or lines you want to test.
The possible
values for
line
are found in the
/dev
directory and are the last two characters of the
tty
device name.
For example, if you want to test the communications system for
devices named
tty02
,
tty03
, and
tty14
, specify
02
,
03
, and
14
, separated by spaces, for the
line
variable.
In addition, the line variable can specify a range of lines to
test.
For example, 00-08.
The following example exercises communication lines
tty22
and
tty34
for 45 minutes in the background:
#
cmx -l 22 34 -t45 &
The following example exercises lines
tty00
through
tty07
until you press
[Ctrl/c]:
#
cmx -l 00-07