9    Using the Problem Solving Tools

To help you resolve problems with network connections and network hardware, the operating system provides problem solving tools you can use to complete the following tasks:

The following sections contain information about using the tools associated with these tasks. For information about additional tools you can use to diagnose network services, see Network Administration: Services.

9.1    Detecting Network Interface Failures

You can use the Network Interface Failure Finder (NIFF) daemon, niffd, to detect and report possible failures in network interfaces or their connections.

When you enable monitoring for a particular network interface, the system begins tracking changes in the interface's packet counters. As long as the counters continue to increase, the system assumes that the network interface is functioning. If the counters do not increment within a given period of time, the niffd daemon verifies connectivity by generating its own traffic over the interface. If the daemon itself cannot get the counters to increment, signifying that the interface is functioning, it reports the problem to the Event Manager (EVM) subsystem.

You can review the associated log entries with the Event Viewer, or monitor connectivity problems in real time by using other EVM utilities.

This section describes how to manually configure NIFF to monitor individual interfaces; it does not describe how to provide failover for these interfaces. Although NIFF provides the mechanism that the Redundant Array of Independent Network Adapters (NetRAIN) uses to determine when interfaces have failed, NIFF itself does not provide failover. To configure a NetRAIN set for automatic failover between network interfaces, see Section 2.1.1.2.

9.1.1    Configuring and Deconfiguring NIFF

Use the niffconfig command to enable monitoring for an interface, as follows:

# niffconfig -a interface-id

Replace interface-id with the device name for the network interface you want to monitor, for example, tu0. If necessary, you can specify more interfaces separated by spaces.

In addition, if you want the niffd daemon to continue monitoring an interface if you reboot your system, enter the following commands to enable the daemon in the rc.config file:

# rcmgr set NIFFD "YES"
# rcmgr set NIFFC_FLAGS "-a interface-id"

You can display a list of the interfaces that the niffd daemon is currently monitoring by entering the niffconfig command with no options:

# niffconfig
Interface:   tu0, status: UP

If necessary, you can later disable monitoring for an interface by entering the following command:

# niffconfig -r interface-id

Then, if you have configured the system to continue monitoring when you reboot your system, use the rcmgr command to update the NIFFC_FLAGS parameter. Or, to disable monitoring altogether, enter the following commands:

# rcmgr delete NIFFD "YES"
# rcmgr delete NIFFC_FLAGS

See niffconfig(8) and niffd(8) for more information about configuring NIFF.

9.1.2    Viewing NIFF Events

Once NIFF is enabled for one or more interfaces, you can use the Event Viewer to view events related to those interfaces by doing the following:

  1. From the SysMan Menu, select Monitoring and Tuning-->View events to display the Event Viewer.

    Alternatively, enter the following command on a command line:

    # /usr/bin/sysman event_viewer
    

    By default, the Event Viewer lists all events in the logs generated by the syslogd daemon. There could be hundreds or thousands of events; therefore, you will need to supress everything but the events that NIFF generates.

  2. Select Filter... to create a filter for NIFF events. The Filter dialog box is displayed.

  3. Select the Event Name check box and the associated equal to check box.

  4. Enter the sys.unix.hw.net.niff.* string into the Event Name text field. This string specifically identifies events that NIFF generates.

  5. Optionally, if you want to supress NIFF informational messages and alerts and view only interface failures, filter the events by priority, as follows:

    1. Select the Priority check box, the associated equal to check box, and the Range check box.

    2. Specify a range of 600-700 in the Range text field.

      Failures are reported with a priority of 600. Informational messages and alerts are reported with a priority of 200; therefore, they will be hidden.

  6. Select OK to save and apply the specified filters.

    If NIFF has generated any events, the Event Viewer displays them.

The Event Viewer displays events that have already been reported. You must select the Refresh option to view additional events as they are reported. Or, you can use the following procedure to send connectivity alerts directly to a terminal on your local console when the niffd daemon reports them to EVM:

  1. Open a new terminal (for example, dtterm or xterm) on your local console.

  2. Execute one of the following commands in the new terminal.

    To display all NIFF events (informational messages, alerts, and failures):

    # evmwatch | evmshow -f "[name sys.unix.hw.net.niff.*]" -t "@timestamp [@priority] @@"
    

    To display only failures:

    # evmwatch -f "[priority >= 600]" | evmshow -f "[name sys.unix.hw.net.niff.*]" -t "@timestamp [@priority] @@"
    

    The terminal will display events as the niffd daemon reports them. The terminal will appear dormant until an event of the appropriate priority is reported.

    You cannot execute additional commands in this terminal until you abort the process by typing [Ctrl]/[c].

Note that if the system running the niffd daemon contains only one network interface, you cannot use this method to monitor that network interface's connectivity via a remote host. You must be on the local console.

See System Administration for more information about EVM.

9.2    Testing Access to Internet Network Hosts

Use the ping command to test your system's ability to reach a host on the Internet network. The ping command has the following syntax:

/usr/sbin/ping [options... ] hostname

Table 9-1 describes some of the ping command options.

Table 9-1:  Options to the ping Command

Option Function
-c count Specifies the number of ECHO RESPONSE packets to send and receive.
-I interface Specifies the interface over which to send packets.
-R Includes the RECORD_ROUTE option in the packet and displays the route buffer on returned packets.
-r Executes the ping command for a host directly connected to the local host. With this option, the ping command bypasses normal routing tables and sends the request directly to a host on an attached network. If the host is not on a directly attached network, the local host receives an error message.
-V Specifies the IP version number (4 or 6) of the address returned by the resolver when a host name has both IPv4 and IPv6 addresses. By default, the ping command tries to resolve host names as an IPv6 address then IPv4 address.

The ping command sends an Internet Control Message Protocol (ICMP) echo request to the host specified. When the request is successful, the remote host sends the data back to the local host. If the remote host does not respond to the request, the ping command does not display any results.

To terminate the ping command output, press Ctrl/C. When terminated, the ping command displays statistics on packets sent, packets received, the percentage of packets lost, and the minimum, average, and maximum round-trip packet times.

You can use the output from the ping command to help determine the cause of direct and indirect routing problems such as an unreachable host, a timed-out connection, or an unreachable network.

When using the ping command for fault isolation, first test the local host to verify that it is running. If the local host returns the data correctly, use the ping command to test remote hosts farther and farther away from the local host.

If you do not specify command options, the ping command displays the results of each ICMP request in sequence, the number of bytes received from the remote host, and the round-trip time on a per-request basis.

The following example shows the output from a ping command to a host named host1:

% ping host1
PING host1.corp.com (16.20.32.2): 56 data bytes
64 bytes from 16.20.32.2: icmp_seq=0 ttl=255 time=11 ms
64 bytes from 16.20.32.2: icmp_seq=1 ttl=255 time=3 ms
64 bytes from 16.20.32.2: icmp_seq=2 ttl=255 time=7 ms
64 bytes from 16.20.32.2: icmp_seq=3 ttl=255 time=3 ms
64 bytes from 16.20.32.2: icmp_seq=4 ttl=255 time=7 ms
64 bytes from 16.20.32.2: icmp_seq=5 ttl=255 time=3 ms
[Ctrl/C]
----host1.corp.com PING Statistics---
6 packets transmitted, 6 packets received, 0% packet loss
roundtrip (ms) min/avg/max = 3/5/11 ms

The ping command accepts an IPv4 address, IPv6 address, or node name on the command line. The following example specifies an IPv6 address:

# ping -c 2 5F00:2100:108C:4000:8C40:800:2B2D:2B2
PING (5F00:2100:108C:4000:8C40:800:2B2D:2B2): 56 data bytes
64 bytes from 5F00:2100:108C:4000:8C40:800:2B2D:2B2: icmp6_seq=0
     hlim=58 time=17 ms
64 bytes from 5F00:2100:108C:4000:8C40:800:2B2D:2B2: icmp6_seq=1
     hlim=58 time=17 ms
----5F00:2100:108C:4000:8C40:800:2B2D:2B2 PING Statistics----
2 packets transmitted, 2 packets received, 0% packet loss
round-trip (ms)  min/avg/max = 17/17/17 ms

The command sends appropriate ECHO_REQUEST packets based on the address family being used. In some cases, a single node name might resolve to both an IPv4 and IPv6 address. Use the -V4 or -V6 option specify which address to use.

You can also use the -I flag to force the use of a specific interface. For example:

# ping -I ln0 FE80::800:2B2D:2B2

See ping(8) for more information on this command and its options.

9.3    Displaying Network Statistics

Use the netstat command to display network statistics for sockets, interfaces, and routing tables. You can select several forms of display; each allows you to specify the type of information you want to emphasize.

Table 9-2 shows the netstat command options.

Table 9-2:  Options to the netstat Command

Option Function
-A Displays the address of any associated protocol control blocks.
-a Includes information for all sockets.
-f address_family Includes statistics or address control block reports for the specified address family, for example, inet (IPv4) or inet6 (IPv6).
-I interface Displays information about the specified interface.
-i Provides status information for autoconfigured interfaces.
-m Displays information about memory management usage.
-n Lists network addresses in number form rather than symbolic form.
-r Lists routing tables.
-s Provides statistics per protocol.
-t Displays the time until the interface watchdog routine starts (for use with the -i option).

The -I option provides statistics for a specific interface. See Appendix A for an example of using the -I option to monitor Ethernet, Fiber Distributed Data Interface (FDDI), and token ring interfaces, and a description of the counters, status, and characteristics.

The -i option provides statistics on each configured network interface. Outgoing packet errors (Oerrs) indicate a potential problem with the local host. Incoming errors (Ierrs) indicate a potential problem with the network connected to the interface.

The -f inet and -f inet6 options limit the data displayed to either IPv4 or IPv6, respectively. For example, the netstat -f inet6 -rn command displays only IPv6 routing table entries, as opposed to the default, which displays both IPv4 and IPv6 entries.

The netstat -s command displays statistics for all protocols, including IPv6 and ICMPv6.

The following example shows normal output from the netstat command with the -i option:

% netstat -i
Name  Mtu   Network   Address       Ipkts Ierrs    Opkts Oerrs  Coll
ln0   1500  <Link>                8324125     0  8347463     0 237706
ln0   1500  16.31.16  host1       8324125     0  8347463     0 237706
fza0* 4352  <Link>                      0     0        0     0    0
sl0*  296   <Link>                      0     0        0     0    0
sl1*  296   <Link>                      0     0        0     0    0
tra0  4092  <Link>                     34     0       20     0    0
tra0  4092  16.40.15  host21           34     0       20     0    0
lo0   1536  <Link>                 909234     0   909234     0    0
lo0   1536  loop      localhost    909234     0   909234     0    0

There are no Ierrs or Oerrs, which indicates that there are currently no network connectivity problems.

See netstat(1) for more information about this command and its options.

9.4    Displaying and Modifying the Internet (IPv4) to MAC Address Translation Tables

You can display and modify the Internet to Media Access Control (MAC) address translation tables used by the Address Resolution Protocol (ARP) to help diagnose direct IPv4 routing problems resulting from the following circumstances:

Use the arp -a command to display the entries in the Internet-to-MAC address translation tables. To modify the tables, log in as root and use the arp command as follows:

/usr/sbin/arp [options ] hostname

The following example shows the Ethernet address for an IPv4 host named host1. The system response tells you that the Ethernet address for host1 is aa-00-04-00-8f-11.

# /usr/sbin/arp host1
host1 (16.20.32.2) at aa:0:4:0:8f:11 permanent

The following example shows how to temporarily add host9 to the system translation tables:

# /usr/sbin/arp -s host9 0:dd:0:a:85:0 temp

The following example shows how to remove host8 from the system translation tables:

# /usr/sbin/arp -d host8

See arp(8) for more information on this command.

9.5    Displaying a Datagrams's Route to a Network Host

You can display a datagram's route to a network host to manually test, measure, and manage the network.

To display a datagram's route, use the traceroute command with the following syntax:

traceroute [options...] hostname [packetsize]

Table 9-3 describes some of the traceroute command options.

Table 9-3:  Options to the traceroute Command

Option Function
-m max_ttl Sets the maximum time-to-live (ttl) used in outgoing probe packets. The ttl parameter specifies the maximum number of hops a packet can take to reach its destination. The default is 30 hops.
-n Displays hop addresses numerically only, rather than both numerically and symbolically.
-p port Sets the base User Datagram Protocol (UDP) port number to be used in outgoing probe packets. The default is 33434. The port information is used to select an unused port range if a port in the default range is already used.
-r Bypasses the normal routing tables and sends the probe packet directly to a host on an attached network. If the host is not on a directly attached network, the traceroute command returns an error.
-s IP_address_number Uses the specified IP address number as the source address in outgoing probe packets. On hosts with more than one IP address, this option forces the traceroute command to use the specified source address rather than any others the host might have. If the IP address is not one of the receiving host's interface addresses, the command returns an error and does not send a probe packet.
-t type-of-service value Sets the type-of-service in probe packets to the specified value. The default is zero. The value must be a decimal integer in the range 0-255. This option tells you if different types of service result in different paths. This option is available only in Berkeley UNIX (4.4BSD) environments. Not all types of service are legal or meaningful. Useful values for this option are 16 (low delay) and 8 (high delay). See RFC 791, Internet Protocol for more information on types of service.
-v Displays verbose output, which includes received ICMP messages other than time exceeded and port unreachable.
-V version Specifies the IP version number (4 or 6) of the address returned by the resolver when a host name has both IPv4 and IPv6 addresses. By default, the traceroute command tries to resolve host names as an IPv6 address then IPv4 address.
-w wait_time Sets the time (in seconds) to wait for a response to a probe. The default is 3 seconds.
packetsize Sets the packet size (in bytes) for the probe packet. The default size is 38 bytes.

The traceroute command sends UDP packets (known as probe packets) to an unused port on the remote host, and listens for ICMP replies from IP routers. It sends the probe packets with a small ttl parameter, which specifies the maximum number of hops a packet can take to reach its destination. The traceroute command starts by specifying a ttl of one hop and it increases the ttl by one for each probe packet it sends. It continues sending probe packets until a packet reaches the destination or until the ttl reaches the maximum number of hops.

In response to each probe packet, the traceroute command can receive one of the following ICMP messages:

When the traceroute command sends three probe packets (datagrams) for each ttl setting, it displays a line showing the following:

If multiple IP routers respond to the probe, the traceroute command displays the address of each IP router. If the traceroute command does not elicit a response in 3 seconds (the default wait time), an asterisk (*) is displayed for the probe.

The following example shows a successful traceroute command to host2:

% traceroute host2
traceroute to host2 (555.55.5.5), 30 hops max, 40 byte packets
 1  host3 (555.55.5.1) 2 ms 2 ms 2 ms
 2  host5 (555.55.5.2) 5 ms 6 ms 4 ms
 3  host7 (555.55.5.3) 7 ms 7 ms 6 ms
 4  host2 (555.55.5.5) 12 ms 8 ms 8 ms

The traceroute command with the host argument prints the route that packets take to both IPv4 and IPv6 hosts.

See traceroute(8) for more information about this command and its options.

9.6    Displaying Headers of Packets on the Network

You display packet headers on the network when you want to monitor the network traffic associated with a particular network service. This is usually done to determine whether requests are being received or acknowledged, or to determine the source of network requests, in the case of slow network performance.

Use the tcpdump command to display packet headers for a network interface. This command enables you to specify the interface on which to listen, the direction of the packet transfer, and the type of protocol traffic to display. In addition, it enables you to identify the source of the packet. See tcpdump(8) for more information.

Note

In order to use the tcpdump command, the packetfilter option must be configured into the kernel and the system rebooted. See packetfilter(7) for more information.

9.7    Viewing the Error Log File

To diagnose kernel and hardware errors, you can look at the system events that occurred prior to the errors. Messages from system events, such as error messages relating to the software kernel and system hardware, and informational messages about system status, startup, and diagnostics, are recorded in the binary error log file, /var/adm/binary.errlog.

Because this log file is in binary format, the operating system offers special utilities, DECevent and Compaq Analyze, that read the binary log file and run the data through a formatter to display the information. See dia(8) and ca(8) for more information about DECevent and Compaq Analyze, respectively.

Note that these utilities are not available in the operating system by default; you must install the Web-Based Enterprise Services (WEBES) kit, a suite of diagnostic utilities, to obtain them. WEBES is available for installation from the Associated Product CD-ROMs or for download from the following URL:

http://www.support.compaq.com/svctools/webes

See the System Administration manual for information about using the Event Viewer to present errors as interpreted by DECevent and Compaq Analyze. Also, see uerf(8) for an alternative to these utilities.

9.8    Viewing the syslogd Daemon Message Files

You can use the syslogd daemon to help diagnose session layer problems such as access control problems for the Internet Protocol Version 4 (IPv4) and Internet Protocol Version 6 (IPv6).

The syslogd daemon starts running when you boot the system and whenever it receives a hangup signal. By default, it records the system messages for these events in a set of files in the /var/adm/syslog.dated directory (as specified in the /etc/syslog.conf file). The system messages can indicate error conditions or warnings, depending on the priority codes they contain.

Although it is possible to review the contents of the system message files from the command line, it is best to use the Event Viewer that is part of the SysMan Menu utility, because it simplifies access to the files and makes it easier for you to find particular problems. To start the Event Viewer, invoke the SysMan Menu as decribed in Section 1.2.1, then select Monitoring and Tuning-->View events. Alternatively, you can invoke the Event Viewer from a command line by entering the following command:

# /usr/bin/sysman event_viewer

Once the Event Viewer is displayed, you can use it to sort the log entries, filter the entries (for a certain event name, priority level, posting host, or date), and obtain more detailed information about individual entries.

For more information about event management and accessing the system log files, see evm(5), syslogd(8), the System Administration manual, and the online help.