Chapter 11 Troubleshooting

Log files

The Performance Manager GUI writes messages to a log file, /var/opt/pm/log/pmgr gui.log . The Performance Manager daemon ( pmgrd ) also writes messages to a log file, /var/opt/pm/log/pmgrd.log . These log files provide a history that is useful for troubleshooting and debugging.

The installation procedure creates initial copies of the log files with appropriate protections. For security reasons, the log directory ( /var/opt/pm/log ) is protected so that no new files can be created in it. If a log file is deleted, an appropriately protected empty file must be left in its place; otherwise, no new process (that writes to that particular log file) can be started.

To view just the last 50 lines of a log file (the GUI log file, in this example), issue the following command:

% tail -50 /var/opt/pm/log/pmgr_gui.log | more

Here is the entry format used in all log files. Each entry has three lines, the second and third lines being indented. Vertical bars separate each field in a line:

date_time|local_host|remote_host|user

severity|error_code|module|line_number

error_text

The following table describes each field in a log file entry.

Log file field descriptions

Log file entries have this form:

date_time|local_host|remote_host|user

severity|error_code|module|line_number

error_text

Log file field Description
date_time The date and time the entry was written.
local_host The node running the process that generated the entry.
remote_host The node that originated the request. For user interface log files, remote_host is always blank because there is no remote node. For daemon log files, remote_host is blank only if a local event caused the entry.
user

The user running the application. For user interface log files, this is the login name. For daemon log files, this is the login name of the user on the remote node, if it is available. The field is blank if the daemon is unable to determine the name of the application user. For daemon messages that are not caused by a remote request, the user field is Daemon.
severity Possible values are Info, Warn, Fatal, and Debug.
error_code A string that identifies an error.
module The program module that generated the entry.
line_number The line number in the program module where the entry originated.
error_text A description of the message.
Example log file entry

Oct 24 11:47:03 1997|oscar.zso.dec.com||root (smith)

error|PMD_NOSUCHINST|pmdci_manager.c|line 2158

The specified instance does not exist

Nodes not responding

If a node is not responding to the Performance Manager GUI, its icon has a riderless horse:

Either the network link to that node is broken, the node has crashed, or the node doesn't exist in the network.

The installation script starts all Performance Manager metrics servers automatically after a successful installation and configuration, and these servers are started automatically at boot time. Use the startup information about these servers only if you need to restart a Performance Manager server.

Performance Manager Tru64 UNIX metrics server (pmgrd)

This server must run on each node managed by Performance Manager. Without pmgrd , the Performance Manager GUI cannot gather its data from that node.

To see if Performance Manager's Tru64 UNIX metrics server is running, issue the following command:

# ps awx | grep pmgrd

If the server is running, you should see output similar to the following:

329 ?? S < 0:16.02 bin/pmgrd

292 ttyp1 S + 0:00.03 grep pmgrd

If pmgrd is not running, it failed to start or has crashed, see the pmgrd log file, /var/opt/pm/log/pmgrd.log , for the cause. To start pmgrd from the Performance Manager GUI, follow these steps:

  1. From the main window's Execute menu, choose System Management Command Category .
  2. Choose the Start Stop Pmgrd command from the this submenu.
  3. Choose the node on which to start pmgrd .
  4. Press OK or Apply to start pmgrd on the selected node.

To start pmgrd from a root account, issue the pmgrd command with the start argument

/usr/opt/pm/scripts/pmgrd start

If pmgrd is not starting at boot time, ensure that these boot-time startup files exist:

/sbin/rc2.d/K47pmgrd

/sbin/rc3.d/S47pmgrd

If they are missing, re-install the Performance Manager Daemons & Base subset (See the Performance Manager Installation Guide ).

For more information, see the pmgrd(8) reference page.

Performance Manager TruCluster metrics server (clstrmond)

This server must run on each cluster where Performance Manager runs commands. Without clstrmond , a command cannot run on a cluster, and it cannot display its output to the Performance Manager GUI.

To see if Performance Manager's TruCluster metrics server is running, issue the following command:

# ps awx | grep clstrmond

If the server is running, you should see output similar to the following:

329 ?? S < 0:16.02 bin/clstrmond

292 ttyp1 S + 0:00.03 grep clstrmond

If clstrmond is not running, it failed to start or has crashed, see the clstrmond log file, /var/opt/pm/log/clsrtmond.log , for the cause. To start clstrmond from the Performance Manager GUI, follow these steps:

  1. From the main window's Execute menu, choose System Management Command Category .
  2. Choose the Start Stop Clstrmond command from the this submenu.
  3. Choose the node on which to start clstrmond .
  4. Press OK or Apply to start clstrmond on the selected node.

To start clstrmond from a root account, issue the clstrmond command with the start argument

/usr/opt/pm/scripts/clstrmond start

If clsrtrmond is not starting at boot time, ensure that these boot-time startup files exist:

/sbin/rc2.d/K47clstrmond

/sbin/rc3.d/S47clstrmond

If they are missing, re-install the Performance Manager Daemons & Base subset (See the Performance Manager Installation Guide ). The MIB file describing the metrics provided by the TruCluster metrics server is provided in this location:

/usr/opt/pm/data/cluster_mib

For more information, see the clstrmond(8) reference page.

Metrics servers or GUI will not start

If the GUI or metrics servers fail to start, it could be because their log files are missing.

If pmgrd fails to start automatically when a node is rebooted, but can be started manually, its startup files might be missing.

No log file

The installation procedure creates initial copies of the log files with appropriate protections. For security reasons, the log directory ( /var/opt/pm/log ) is protected so that no new files can be created in it. If a log file is deleted, an appropriately protected empty file must be left in its place; otherwise, no new process (that writes to that particular log file) can be started.

  • The GUI log file is /var/opt/pm/log/pmgr_gui.log .
  • The pmgrd log file is /var/opt/pm/log/pmgrd.log .
  • The clstrmond log file is /var/opt/pm/log/clstrmond.log .
No startup files

The installation script writes entries in system startup files that start pmgrd automatically each time a node is rebooted. If pmgrd is not starting on a node after it is booted, check the following files and be sure they have the correct entries:

/sbin/rc2.d/K47pmgrd

/sbin/rc3.d/S47pmgrd

If they are missing, re-install the Performance Manager Daemons & Base subset (see the Performance Manager Installation Guide ).

Commands not running

If commands fail to run on certain nodes:

  1. Make sure the nodes are up.
  2. Before running commands on remote nodes, you must have a login ID, and the /.rhosts file on each remote node must give root access to the node running the Performance Manager GUI. Specify both a node alias and a fully qualified domain name. For example:

gui_node root

gui_node.usc.edu.com root

Disks not visible to Performance Manager

If your kernel configuration does not match your disk configuration, Performance Manager may not recognize the disks that are not configured in the kernel. When you add disks to your system configuration, check that your kernel is configured for the new device. If needed, run the doconfig command to update your kernel. See the doconfig(8) reference page for more information.

Reporting bugs

If an error occurs while installing or using Performance Manager, and you believe the error is caused by a problem with the product, take one of the following actions:

  • If you have a basic or DECsupport Software Agreement, call your Customer Support Center. The Customer Support Center provides high-level advisory and remedial assistance.
  • If you have a Self-Maintenance Software Agreement or you purchased Performance Manager within the past 90 days, you can submit a Software Performance Report.
  • For documentation problems, casual questions, or suggestions, use the response form, or email us at pm_feedback.zso.dec.com .
Software performance reports

When you submit a Software Performance Report, please take the following steps:

  • Reduce the problem to as small a size as possible.
  • Describe as accurately as possible the circumstances and state of the node when the problem occurred. Include the description and version number of Performance Manager being used. Demonstrate the problem with specific examples.
  • Report only one problem per Software Performance Report; this ensures a faster response.
  • Mail the Software Performance Report package to DIGITAL.

Many Software Performance Reports do not contain enough information to duplicate or identify the problem. Concise, complete information helps DIGITAL give accurate and timely service to software problems.

Go to Main   Go to Previous   Go to Next