Chapter 10 Troubleshooting

This section contains information that will help you keep Performance Manager running properly

Log Files

The Performance Manager GUI writes messages to a log file, /var/opt/pm/log/pmgr gui.log. The Performance Manager daemon ( pmgrd ) also writes messages to a log file, /var/opt/pm/log/pmgrd.log. These log files provide a history that is useful for troubleshooting and debugging.

The installation procedure creates initial copies of the log files with appropriate protections. For security reasons, the log directory ( /var/opt/pm/log ) is protected so that no new files can be created in it. If a log file is deleted, an appropriately protected empty file must be left in its place; otherwise, no new process (that writes to that particular log file) can be started.

To view just the last 50 lines of a log file (the GUI log file, in this example), issue the following command:

% tail -50 /var/opt/pm/log/pmgr_gui.log | more

Here is the entry format used in all log files. Each entry has three lines, the second and third lines being indented. Vertical bars separate each field in a line:

date_time | local_host | remote_host | user

severity | error_code | module | line_number

error_text

The following table describes each field in a log file entry.

Log File Field Description
date_time The date and time the entry was written.
local_host The node running the process that generated the entry.
remote_host The node that originated the request. For user interface log files, remote_host is always blank because there is no remote node. For daemon log files, remote_host is blank only if a local event caused the entry.
user

The user running the application. For user interface log files, this is the login name. For daemon log files, this is the login name of the user on the remote node, if it is available. The field is blank if the daemon is unable to determine the name of the application user. For daemon messages that are not caused by a remote request, the user field is Daemon.
severity Possible values are Info, Warn, Fatal, and Debug.
error_code A string that identifies an error.
module The program module that generated the entry.
line_number The line number in the program module where the entry originated.
error_text A description of the message.
Example Log File Entry

May 24 11:47:03 1999 | oscar.zso.dec.com | | root (smith)

error | PMD_NOSUCHINST | pmdci_manager.c | line 2158

The specified instance does not exist

Nodes Not Responding

If a node is not responding to the Performance Manager GUI, its icon shows a hand holding the world down:

Either the network link to that node is broken, the node has crashed, or the node doesn't exist in the network.

The installation script starts all Performance Manager metrics servers automatically after a successful installation and configuration, and these servers are started automatically at boot time. Use the startup information about these servers only if you need to restart a Performance Manager server.

Performance Manager Tru64 UNIX Metrics Server (pmgrd)

This server must run on each node managed by Performance Manager. Without pmgrd, the Performance Manager GUI cannot gather its data from that node.

To see if Performance Manager's Tru64 UNIX metrics server is running, issue the following command:

# ps awx | grep pmgrd

If the server is running, you should see output similar to the following:

329 ?? S < 0:16.02 bin/pmgrd

292 ttyp1 S + 0:00.03 grep pmgrd

If pmgrd is not running, it failed to start or has crashed, see the pmgrd log file, /var/opt/pm/log/pmgrd.log, for the cause. To start pmgrd from the Performance Manager GUI, follow these steps:

  1. From the main window's Execute menu, choose System Management Command Category.
  2. Choose the Start Stop Pmgrd command from the this submenu.
  3. Choose the node on which to start pmgrd.
  4. Press OK or Apply to start pmgrd on the selected node.

To start pmgrd from a root account, issue the pmgrd command with the start argument:

/usr/opt/pm/scripts/pmgrd start

If pmgrd is not starting at boot time, ensure that these boot-time startup files exist:

/sbin/rc2.d/K47pmgrd

/sbin/rc3.d/S47pmgrd

If they are missing, re-install the Performance Manager Daemons & Base subset (See the Performance Manager Installation Guide).

For more information, see the pmgrd(8) reference page.

Performance Manager TruCluster Metrics Server (clu_mib)

This server must run on each cluster where Performance Manager runs commands. Without clu_mib, a command cannot run on a cluster, and it cannot display its output to the Performance Manager GUI.

Beginning with Tru64 UNIX Version 5, this server ships with the operating system. In earlier releases the server shipped with the Performance Manager product. To successfully use a Version 5 system to monitor Tru64 UNIX Version 4.x systems, you must install the clu_mib metrics server on the monitored systems. You can ensure this configuration by installing the appropriate PM Version 4.0x on these systems.

To see if Performance Manager's TruCluster metrics server is running, issue the following command:

# ps awx | grep clu_mib

If the server is running, you should see output similar to the following:

329 ?? S < 0:16.02 bin/clu_mib

292 ttyp1 S + 0:00.03 grep clu_mib

If clu_mib is not running, it failed to start or has crashed, see the clu_mib log file, /var/opt/pm/log/clsrtmond.log, for the cause. To start clu_mib from the Performance Manager GUI, follow these steps:

  1. From the main window's Execute menu, choose System Management Command Category.
  2. Choose the Start Stop Clstrmond command from the this submenu.
  3. Choose the node on which to start clu_mib.
  4. Press OK or Apply to start clu_mib on the selected node.

To start clu_mib from a root account, issue the clu_mib command with the start argument:

/usr/opt/pm/scripts/clu_mib start

If clsrtrmond is not starting at boot time, ensure that these boot-time startup files exist:

/sbin/rc2.d/K47clu_mib

/sbin/rc3.d/S47clu_mib

If they are missing, re-install the Performance Manager Daemons & Base subset (See the Performance Manager Installation Guide). The MIB file describing the metrics provided by the TruCluster metrics server is provided in this location:

/usr/opt/pm/data/cluster_mib

For more information, see the clu_mib(8) reference page.

Metrics Servers or GUI Will Not Start

If the GUI or metrics servers fail to start, it could be because their log files are missing. If the GUI fails to appear and there is no error message, check the DISPLAY environment variable, and confirm that an xhost session is authorized.

If pmgrd fails to start automatically when a node is rebooted, but can be started manually, its startup files might be missing.

No Log File

The installation procedure creates initial copies of the log files with appropriate protections. For security reasons, the log directory ( /var/opt/pm/log) is protected so that no new files can be created in it. If a log file is deleted, an appropriately protected empty file must be left in its place; otherwise, no new process (that writes to that particular log file) can be started.

  • The GUI log file is /var/opt/pm/log/pmgr_gui.log.
  • The pmgrd log file is /var/opt/pm/log/pmgrd.log.
  • The clu_mib log file is /var/opt/pm/log/clu_mib.log.
No Startup Files

The installation script writes entries in system startup files that start pmgrd automatically each time a node is rebooted. If pmgrd is not starting on a node after it is booted, check the following files and be sure they have the correct entries:

/sbin/rc2.d/K47pmgrd

/sbin/rc3.d/S47pmgrd

If they are missing, re-install the Performance Manager Daemons & Base subset (see the Performance Manager Installation Guide ).

Commands Not Running

If commands fail to run on certain nodes:

  1. Make sure the nodes are up.
  2. Before running commands on remote nodes, you must have a login ID, and the /.rhosts file on each remote node must give root access to the node running the Performance Manager GUI. Specify both a node alias and a fully qualified domain name. For example:

gui_node root

gui_node.usc.edu.com root

Disks Not Visible to Performance Manager

If your kernel configuration does not match your disk configuration, Performance Manager may not recognize the disks that are not configured in the kernel. When you add disks to your system configuration, check that your kernel is configured for the new device. If needed, run the doconfig command to update your kernel. See the doconfig(8) reference page for more information.

Reporting Bugs

If an error occurs while installing or using Performance Manager, and you believe the error is caused by a problem with the product, take one of the following actions:

  • If you have a basic or DECsupport Software Agreement, call your Customer Support Center. The Customer Support Center provides high-level advisory and remedial assistance.
  • If you have a Self-Maintenance Software Agreement or you purchased Performance Manager within the past 90 days, you can submit a Software Performance Report.
  • For documentation problems, casual questions, or suggestions, use the response form, or email us at pm_feedback@compaq.com.
Software performance reports

When you submit a Software Performance Report, please take the following steps:

  • Reduce the problem to as small a size as possible.
  • Describe as accurately as possible the circumstances and state of the node when the problem occurred. Include the description and version number of Performance Manager being used. Demonstrate the problem with specific examples.
  • Report only one problem per Software Performance Report; this ensures a faster response.
  • Mail the Software Performance Report package to Compaq.
  • Many Software Performance Reports do not contain enough information to duplicate or identify the problem. Concise, complete information helps Compaq give accurate and timely service to software problems.
Go to Main   Go to Previous   Go to Next