2 Monitoring Your System

Before you start to monitor your system to identify a performance problem, you should understand your user environment, the applications you are running and how they use the various subsystems, and what is acceptable performance.

The source of the performance problem may not be obvious. For example, if your disk I/O subsystem is swamped with activity, the problem may be in either the virtual memory subsystem or the disk I/O subsystem. In general, obtain as much information as possible about the system before you attempt to tune it.

In addition, how you decide to tune your system depends on how your users and applications utilize the system. For example, if you are running CPU-intensive applications, the virtual memory subsystem may be more important than the unified buffer cache (UBC).

This chapter contains the following information:

A brief introduction to the tools that you can use to monitor your system (Section 2.1)
Examples of how to use some of the tools to perform a variety of monitoring tasks (Section 2.2)

2.1 Monitoring Tools Overview

Numerous system monitoring tools are available. You may have to use various tools in combination with each other in order to get an accurate picture of your system. In addition to obtaining information about your system when it is running poorly, it is also important for you to obtain information about your system when it is running well. By comparing the two sets of data, you may be able to pinpoint the area that is causing the performance problem.

The primary monitoring tools are described in Table 2-1.

Table 2-1: Primary Monitoring Tools

Tool	Description
`iostat`	Reports I/O statistics for terminals, disks, and the system. See Section 2.2.5 for more information on using the `iostat` command to diagnose system performance problems.
`netstat`	Displays network statistics. The `netstat` command symbolically displays the contents of network-related data structures. Depending on the options supplied to `netstat`, the output format will vary. The more common format is to supply the `netstat` command with a time interval to determine the number of incoming and outgoing packets, as well as packet collisions, on a given interface. See Section 2.2.10 for more information on using the `netstat` command to diagnose system performance problems.
`nfsstat`	Displays Network File System (NFS) and Remote Procedure Call (RPC) statistics for clients and servers. The output includes the number of packets that had to be retransmitted (`retrans`) and the number of times a reply transaction ID did not match the request transaction ID (`badxid`). See Section 2.2.11 for more information on using the `nfsstat` command to diagnose system performance problems.
`ps`	Displays the current status of the system processes. Although `ps` is a fairly accurate snapshot of the system, it cannot begin and finish a snapshot as fast as some processes change state. As a result, the output may contain some inaccuracies. The `ps` command includes information about how the processes use the CPU and virtual memory. See Section 2.2.1 for more information on using the `ps` command to diagnose system performance problems.
`uptime`	Shows how long a system has been running and the system load average. The load average numbers give the number of jobs in the run queue for the last 5 seconds, the last 30 seconds, and the last 60 seconds. See Section 2.2.2 for more information on using the `uptime` command to diagnose system performance problems.
`vmstat`	Shows information about process threads, virtual memory, interrupts, and CPU usage for a specified time interval. See Section 2.2.3 for more information on using the `vmstat` command to diagnose system performance problems.

Other tools can also provide you with important monitoring information. These secondary monitoring tools are described in Table 2-2.

Table 2-2: Secondary Monitoring Tools

Tool	Description
`atom`	Serves as a general-purpose framework for creating sophisticated program analysis tools. It includes numerous unsupported prepackaged tools and the following supported tools: `third`, `hiprof`, and `pixie`. The `third` tool performs memory access checks and detects memory leaks in an application. The `hiprof` tool produces either a flat or hierarchical profile of an application. The flat profile shows the execution time spent in a given procedure, and the hierarchical profile shows the execution time spent in a given procedure and all of its descendents. The `pixie` tool partitions an application into basic blocks and counts the number of times each basic block is executed. For details, see the Programmer's Guide or `atom`(1).
`dbx`	Analyzes running kernels and dump files. The `dbx` command invokes a source-level debugger. You can use `dbx` with code produced by the `cc` and `as` compilers and with machine code. After invoking the `dbx` debugger, you issue `dbx` commands that allow you to examine source files, control program execution, display the state of the program, and debug at the machine-code level. To analyze kernels, use the `-k` option. See Section 2.2.9 for more information on using the `dbx` command to diagnose system performance problems.
`dumpfs`	Displays UFS file system information. This command is useful for getting information about the file system block and fragment size and the minimum free space percentage. See Section 2.2.6 for more information on using the `dumpfs` command to diagnose system performance problems.
`gprof`	Displays call graph profile data showing the effects of called routines. Similar to the `prof` utility. For details, see the Programmer's Guide or `gprof`(1).
`ipcs`	Reports interprocess communication (IPC) statistics. The `ipcs` command displays information about currently active message queues, shared-memory segments, semaphores, remote queues, and local queue headers. Information provided in the following fields by the `ipcs` `-a` command can be especially useful: - `QNUM`, the number of messages currently outstanding in the associated message queue - `CBYTES`, the number of bytes in messages currently outstanding in the associated message queue - `QBYTES`, the maximum number of bytes allowed in messages outstanding in the associated message queue - `SEGSZ`, the size of the associated shared memory segment - `NSEMS`, the number of semaphores in the set associated with the semaphore entry See `ipcs`(1) for details.
`kdbx`	Analyzes running kernels and dump files. The `kdbx` debugger is an interactive program that lets you examine either the running kernel or dump files created by the `savecore` utility. In either case, you will be examining an object file and a core file. For running systems, these files are usually `/vmunix` and `/dev/mem`, respectively. Dump files created by `savecore` are saved in the directory specified by the `/sbin/init.d/savecore` script which is, by default, `/var/adm/crash`. All `dbx` commands are available in `kdbx` using the `dbx` option. See the manual Kernel Debugging or `kdbx`(8) for details.
`kprofile`	Profiles the kernel using the performance counters in the hardware. See the manual Kernel Debugging or `kprofile`(1) for details.
`nfswatch`	Monitors all NFS network traffic and divides it into several categories. The number and percentage of packets received in each category appears on the screen in a continuously updated display. Your kernel must be configured with the `packetfilter` option. See `nfswatch`(8) and `packetfilter`(7) for details.
`pixie`	Provides basic block counting data when used with `prof`.
`prof`	Displays statistics on where time is being spent - at the routine level, basic block level, or instruction level - during the execution of a program. This information will help you to determine where to concentrate your efforts to optimize source code.
`showfdmn`	Displays the attributes of an AdvFS file domain and detailed information about each volume in the file domain.
`showfile`	Displays the full storage allocation map (extent map) for files in an Advanced File System (AdvFS). An extent is a contiguous area of disk space that the file system allocates to a file.
`showfsets`	Displays the filesets (or clone filesets) and their characteristics in a specified domain.
`swapon`	Specifies additional disk space for paging and swapping and displays swap space utilization, including the total amount of allocated swap space, the amount of swap space that is being used, and the amount of free swap space. See Section 2.2.4 for more information on using the `swapon` command to diagnose system performance.
`tcpdump`	Displays network traffic. The `tcpdump` command prints out the headers of packets on a network interface that match the Boolean expression. Your kernel must be configured with the `packetfilter` option. See `tcpdump`(8) and `packetfilter`(7) for details.
`uprofile`	Profiles user code using performance counters in the hardware. See `uprofile`(1) for details.
`voldg`	Displays, with the list option, information about an LSM diskgroup's attributes. See `voldg`(8) for details.
`voldisk`	Displays, with the list option, a disk's configuration and attribute information. See `voldisk`(8) for details.
`volprint`	Displays information from records in the LSM configuration database. See `volprint`(8) for more information.
`volstat`	Displays Logical Storage Manager statistics for LSM volumes, plexes, subdisks, or disks. See `volstat`(8) for details.
`voltrace`	Prints records from an event log. Sets event trace masks to determine what type of events will be tracked. See `voltrace`(8) for more information.
`volwatch`	Monitors LSM for failure events and sends mail to the specified user. See `volwatch`(8) for more information.
`w`	Displays a summary of current system activity. The system summary shows the current time, the amount of time since the system was last started, the number of users logged in to the system, and the load averages. The load average numbers give the number of jobs in the run queue for the last 5 seconds, the last 30 seconds, and the last 60 seconds. See `w`(1) for details.
`xload`	Displays the system load average for X. The `xload` command displays a periodically updating histogram of the system load average. See `xload`(1X) for details.

POLYCENTER Performance Solution, a layered product, is also available as a monitoring tool. It can monitor many Digital UNIX nodes simultaneously. A single-node version of the product is included with the operating system at no extra charge.

POLYCENTER Performance Solution has a graphical user interface (GUI) called Performance Manager. Performance Manager is a real-time performance monitor that allows you to detect and correct performance problems. Graphs and charts can show hundreds of different system values, including CPU performance, memory usage, disk transfers, file-system capacity, network efficiency, and AdvFS and cluster-specific metrics.

Thresholds can be set to alert you to or correct a problem when it occurs, and archives of data can be kept for high-speed playback or long-term trend analysis.

Performance Manager has performance analysis and system management scripts, as well as cluster-specific and AdvFS-specific scripts. These scripts can be run simultaneously on multiple nodes from the GUI.

Performance Manager automatically discovers cluster members when a single cluster member node is specified, and it can monitor both individual cluster members and an entire cluster concurrently.

For details on POLYCENTER Performance Solution, see the manual POLYCENTER Performance Solution for UNIX Systems: User's Guide.

2.2 Determining the Problem

The following sections describe how to use monitoring tools to identify the system component or subsystem that is causing a performance degradation. Once you determine which subsystem or component is causing the problem and you are sure that you understand your system environment and the needs of your users, refer to the appropriate section in Chapter 3 for information on tuning the particular subsystem or component.

2.2.1 Monitoring Processes - ps Command

The ps command displays the current status of the system processes. You can use it to determine the current running processes, their state, and how they utilize system memory. The command lists processes in order of decreasing CPU usage, so you can easily determine which processes are using the most CPU time. Be aware that ps is only a snapshot of the system; by the time the command finishes executing, the system state has probably changed. For example, one of the first lines of the command may refer to the ps command itself.

An example of the ps command follows:

# ps aux

USER  PID  %CPU %MEM   VSZ   RSS  TTY S    STARTED      TIME  COMMAND
chen  2225  5.0  0.3  1.35M  256K p9  U    13:24:58  0:00.36  cp /vmunix /tmp
root  2236  3.0  0.5  1.59M  456K p9  R  + 13:33:21  0:00.08  ps aux
sorn  2226  1.0  0.6  2.75M  552K p9  S  + 13:25:01  0:00.05  vi met.ps
root   347  1.0  4.0  9.58M  3.72 ??  S      Nov 07 01:26:44  /usr/bin/X11/X -a
root  1905  1.0  1.1  6.10M  1.01 ??  R    16:55:16  0:24.79  /usr/bin/X11/dxpa
sorn  2228  0.0  0.5  1.82M  504K p5  S  + 13:25:03  0:00.02  more
sorn  2202  0.0  0.5  2.03M  456K p5  S    13:14:14  0:00.23  -csh (csh)
root     0  0.0 12.7   356M  11.9 ??  R <  Nov 07 3-17:26:13  [kernel idle]
            [1]  [2]   [3]    [4]    [5]               [6]

The ps command includes the following information that you can use to diagnose CPU and virtual memory problems:

Percent CPU time usage (%CPU). [Return to example]
Percent real memory usage (%MEM). [Return to example]
Process virtual address size (VSZ) - This is the total amount of virtual memory allocated to the process. [Return to example]
Real memory (resident set) size of the process (RSS) - This is the total amount of physical memory mapped to virtual pages (that is, the total amount of memory that the application has physically used). Shared memory is included in the resident set size figures; as a result, the total of these figures may exceed the total amount of physical memory available on the system. [Return to example]
Process status or state (S) - This specifies whether a process is runnable (R), uninterruptible sleeping (U), sleeping (S), idle (I), stopped (T), or halted (H). It also indicates whether the process is swapped out (W), whether the process is exceeding a soft limit on memory requirements (>), whether the process is a process group leader with a controlling terminal (+), and whether the process priority has been reduced (N) or raised (<) with the nice or renice command. [Return to example]
Current CPU time used (TIME). [Return to example]

From the output of the ps command, you can determine which processes are consuming most of your system's CPU time and memory and whether processes are swapped out. Concentrate on processes that are runnable or paging. Here are some concerns to keep in mind:

If a process is using a large amount of memory (see the RSS and VSZ fields), the process could have a problem with memory usage.
Are duplicate processes running? Use the kill command to terminate any unnecessary processes.
If a process is using a large amount of CPU time, it may be in an infinite loop and require changes to its source code.
If a process using a large amount of CPU time is running correctly, you may want to lower its priority with either the nice or renice command. Note that these commands have no effect on memory usage by a process.
Check the processes that are swapped out. Examine the S (state) field. A W entry indicates a process that has been swapped out. If processes are continually being swapped out, this could indicate a virtual memory problem.

For information about memory tuning, see Section 3.4. For information about improving the performance of your applications, see the Programmer's Guide.

2.2.2 Measuring the System Load - uptime Command

The uptime command shows how long a system has been running and the load average. The load average counts jobs that are waiting for disk I/O and also applications whose priorities have been changed with either the nice or renice command. The load average numbers give the average number of jobs in the run queue for the last 5 seconds, the last 30 seconds, and the last 60 seconds.

An example of the uptime command follows:

# uptime

1:48pm  up 7 days,  1:07,  35 users,  load average: 7.12, 10.33, 10.31

Note whether the load is increasing or decreasing. An acceptable load average depends on your type of system and how it is being used. In general, for a large system, a load of 10 is high, and a load of 3 is low. Workstations should have a load of 1 or 2. If the load is high, look at what processes are running with the ps command. You may want to run some applications during off-peak hours. You can also lower the priority of applications with the nice or renice command to conserve CPU cycles.

See Section 3.2 for additional information on how to reduce the load on your system.

2.2.3 Monitoring Virtual Memory and CPU Usage - vmstat Command

The vmstat command shows the virtual memory, process, and total CPU statistics for a specified time interval. The first line of the output is for all time since a reboot, and each subsequent report is for the last interval. Because the CPU operates faster than the rest of the system, performance bottlenecks usually exist in the memory or I/O subsystems.

An example of the vmstat command follows:

# vmstat 1

Virtual Memory Statistics: (pagesize = 8192)
procs        memory            pages                       intr        cpu
r  w  u  act  free wire  fault cow zero react pin pout   in  sy  cs  us sy  id
2 66 25  6417 3497 1570  155K  38K  50K    0  46K    0    4 290 165   0  2  98
4 65 24  6421 3493 1570   120    9   81    0    8    0  585 865 335  37 16  48
2 66 25  6421 3493 1570    69    0   69    0    0    0  570 968 368   8 22  69
4 65 24  6421 3493 1570    69    0   69    0    0    0  554 768 370   2 14  84
4 65 24  6421 3493 1570    69    0   69    0    0    0  865  1K 404   4 20  76
               [1]                                  [2]     [3]         [4]

The vmstat command includes information that you can use to diagnose CPU and virtual memory problems. The following fields are particularly important:

Virtual memory information, including the number of pages that are active (act), the number of pages on the free list (free), and the number of pages wired down (wire). See Section 1.3.1 for more information. [Return to example]
The number of pages that have been paged out (pout). [Return to example]
Interrupt information, including the number of nonclock device interrupts per second (in), the number of system calls called per second (sy), and the number of task and thread context switches per second (cs). [Return to example]
CPU usage information, including the percentage of used time for normal and priority processes (us), the percentage of system time (sy), and the percentage of idle time (id). [Return to example]

While diagnosing a bottleneck situation, keep the following issues in mind:

Is the system demand valid? That is, is the increase in demand associated with something different in your system that typically has an adverse effect on the environment, for example, a new process or additional users?
Check the size of the free page list (free). Compare the number of free pages to the values for the active pages (act) and the wired pages (wired). The sum of the free, active, and wired pages should be close to the amount of physical memory in your system. If the value for free is less than 100, you may have a virtual memory problem. Swapping may begin when the free page list is less than 128.
Examine the pout field. If the number of page outs is consistently high, you could have a virtual memory problem; you are using more virtual space than you have physical space. You could also have insufficient swap space or your swap space could be configured inefficiently. Use the swapon -s command to display your swap device configuration and the iostat command to determine which disk is being used the most.
Check the user (us), system (sy), and idle (id) time split.
You must understand how your applications use the system to determine the appropriate values for these times. The goal is to keep the CPU as productive as possible. Idle CPU cycles occur when no runnable processes exist or when the CPU is waiting to complete an I/O or memory request.
The following list presents information on how to interpret the values for user, idle, and system time:
- A high user time and a low idle time could indicate that your application code is consuming most of the CPU. You can optimize the application, or you may need a more powerful processor.
- A high system time and low idle time could indicate that something in the application load is stimulating the system with high overhead operations. Such overhead operations could consist of high system call frequencies, high interrupt rates, large numbers of small I/O transfers, or large numbers of IPCs or network transfers.
  Note that a high system time and low idle time could be caused by failing hardware. Use the uerf command to check your hardware.
  A high system time could also indicate that the system is thrashing; that is, the amount of memory available to the virtual memory subsystem has gotten so low that the system is spending all its time paging and swapping in an attempt to regain memory. A system that spends more than 50 percent of its time in system mode or idle mode may be doing a lot of I/O, so this could indicate a virtual memory problem.
- In many cases, if the idle time is very low, your system is utilizing its CPU efficiently.
  If you have a high idle time and you are sure that your system has a typical load, one or more of the following problems may exist: the hardware may be saturated (bus bandwidth, arm motion, CPU cycles, cache thrashing), one or more kernel data structures is being exhausted, or you may have a hardware or kernel resource block such as an application, I/O, or network bottleneck.

See Chapter 3 for information on improving CPU usage and I/O operations and for information on tuning virtual memory, disks, and file systems.

2.2.4 Displaying the Swap Space Configuration - swapon Command

Use the swapon command with the -s option to display your swap device configuration. For each swap partition, the command displays the total amount of allocated swap space, the amount of swap space that is being used, and the amount of free swap space. This information should help you determine how your swap space is being utilized. For example:

# swapon -s

Swap partition /dev/rz2b (default swap):
    Allocated space:        16384 pages (128MB)
    In-use space:               1 pages (  0%)
    Free space:             16383 pages ( 99%)

 

Swap partition /dev/rz12c:
    Allocated space:       128178 pages (1001MB)
    In-use space:               1 pages (  0%)
    Free space:            128177 pages ( 99%)

 


 

Total swap allocation:
    Allocated space:       144562 pages (1129MB)
    Reserved space:          2946 pages (  2%)
    In-use space:               2 pages (  0%)
    Available space:       141616 pages ( 97%)

See Section 3.4.2.1 for information on how to tune your swap space configuration. Use the iostat command to determine which disks are being used the most.

2.2.5 Monitoring Disk I/O - iostat Command

The iostat command reports I/O statistics for terminals, disks, and the CPU. The first line of the output is the average since boot time, and each subsequent report is for the last interval. An example of the iostat command is as follows:

# iostat 1

      tty     rz1      rz2      rz3      cpu
 tin tout bps tps  bps tps  bps tps  us ni sy id
  0    3   3   1    0   0    8   1   11 10 38 40
  0   58   0   0    0   0    0   0   46  4 50  0
  0   58   0   0    0   0    0   0   68  0 32  0
  0   58   0   0    0   0    0   0   55  2 42  0

The iostat command reports I/O statistics that you can use to diagnose disk I/O performance problems. For example, the command displays information about the following:

For each disk, (rzn), the number of bytes (in thousands) transferred per second (bps) and the number of transfers per second (tps). Some disks report the milliseconds per average seek (msps).
For the system, the percentage of time the system has spent in user state running processes either at their default priority or higher priority (us), in user mode running processes at a lowered priority (ni), in system mode (sy), and idle (id). This information enables you to determine how disk I/O is affecting the CPU.

Note the following when you use the iostat command:

Determine which disk is being used the most and which is being used the least. The information will help you determine how to distribute your file systems and swap space. Use the swapon -s command to determine which disks are used for swap space.
If a disk is doing a large number of transfers (the tps field) but reading and writing only small amounts of data (the bps field), examine how your applications are doing disk I/O. The application may be performing a large number of I/O operations to handle only a small amount of data. You may want to rewrite the application if this behavior is not necessary.

See Section 3.6 for information on how to improve your disk I/O performance.

2.2.6 Displaying UFS Information - dumpfs Command

The dumpfs command dumps UFS information. The command prints out the super block and cylinder group information. The command is useful for getting information about the file system block and fragment sizes and the minimum free space percentage.

The following example shows part of the output of the dumpfs command:

# dumpfs /dev/rrz3g | more

magic   11954   format  dynamic time    Tue Sep 14 15:46:52 1993
nbfree  21490   ndir    9       nifree  99541   nffree  60
ncg     65      ncyl    1027    size    409600  blocks  396062
bsize   8192    shift   13      mask    0xffffe000
fsize   1024    shift   10      mask    0xfffffc00
frag    8       shift   3       fsbtodb 1
cpg     16      bpg     798     fpg     6384    ipg     1536
minfree 10%     optim   time    maxcontig 8     maxbpg  2048
rotdelay 0ms    headswitch 0us  trackseek 0us   rps     60

The information contained in the first lines are relevant for tuning. Of specific interest are the following fields:

bsize - The block size of the file system in bytes.
fsize - The fragment size of the file system in bytes.
minfree - The percentage of space held back from normal users; the minimum free space threshold.
maxcontig - The maximum number of contiguous blocks that will be laid out before forcing a rotational delay; that is, the number of blocks that are combined into a single read request.
maxbpg - The maximum number of blocks any single file can allocate out of a cylinder group before it is forced to begin allocating blocks from another cylinder group.
rotdelay - The expected time (in milliseconds) to service a transfer completion interrupt and initiate a new transfer on the same disk. It is used to decide how much rotational spacing to place between successive blocks in a file.

Keep the following issues in mind:

A large block size (bsize) benefits large I/O transfers but can waste disk space. A small block size uses the disk efficiently but requires more I/O operations. Note that the UFS block size is fixed at 8KB.
For the optimum I/O performance, the fragment size (fsize) can be the same as the block size (bsize).
If rotdelay is zero, blocks are allocated contiguously.
A large value for maxbpg can improve performance for large files.

For information about tuning UFS file system configuration parameters and sysconfigtab configuration attributes to improve your disk I/O performance, see Section 3.6.1.2.

2.2.7 Monitoring AdvFS - advscan, showfdmn, showfile, showfsets

You can use the advscan, showfdmn, showfile, and showfsets commands to display information about AdvFS.

See Section 3.6.1.3 for information about tuning AdvFS.

2.2.7.1 The advscan Command

The advscan command locates pieces of AdvFS domains on disk partitions and in LSM disk groups. Use the advscan command when you have moved disks to a new system, have moved disks around in a way that has changed device numbers, or have lost track of where the domains are. The command is also used for repair if you delete /etc/fdmns, delete a directory domain under /etc/fdmns, or delete some links from a domain directory under /etc/fdmns.

The advscan command accepts a list of volumes or disk groups and searches all partitions and volumes in each. It determines which partitions on a disk are part of an AdvFS file domain. You can run the advscan command to rebuild all or part of your /etc/fdmns directory or you can rebuild it by hand by supplying the names of the partitions in a domain.

The following example scans devices rz0 and rz5 for AdvFS partitions:

# advscan rz0 rz5


 

Scanning disks  rz0 rz5
Found domains:

 

usr_domain
                Domain Id       2e09be37.0002eb40
                Created         Thu Jun 23 09:54:15 1994

 

                Domain volumes          2
                /etc/fdmns links        2

 

                Actual partitions found:
                                        rz0c
                                        rz5c

For the following example, the rz6 domains were removed from /etc/fdmns. The advscan command scans device rz6 and re-creates the missing domains.

# advscan -r rz6


 

Scanning disks  rz6
Found domains:

 

*unknown*
                Domain Id       2f2421ba.0008c1c0
                Created         Mon Jan 23 13:38:02 1995

 

                Domain volumes          1
                /etc/fdmns links        0

 

                Actual partitions found:
                                        rz6a*
*unknown*
                Domain Id       2f535f8c.000b6860
                Created         Tue Feb 28 09:38:20 1995

 

                Domain volumes          1
                /etc/fdmns links        0

 

                Actual partitions found:
                                        rz6b*

 

Creating /etc/fdmns/domain_rz6a/
        linking rz6a

 

Creating /etc/fdmns/domain_rz6b/
        linking rz6b

See advscan(8) for details on the advscan command.

2.2.7.2 The showfdmn Command

The showfdmn command displays the attributes of an AdvFS file domain and detailed information about each volume in the file domain. The following example of the showfdmn command displays domain information for the /usr file domain:

% showfdmn usr


 

               Id              Date Created  LogPgs  Domain Name
2b5361ba.000791be  Tue Jan 12 16:26:34 1993     256  usr

 

Vol   512-Blks      Free  % Used  Cmode  Rblks  Wblks  Vol Name
 1L     820164    351580     57%     on    256    256  /dev/rz0d

See showfdmn(8) for information about the output of the command.

2.2.7.3 The showfile Command

The showfile command displays the full storage allocation map (extent map) for files in an Advanced File System (AdvFS). An extent is a contiguous area of disk space that the file system allocates to a file. The following example of the showfile command displays the AdvFS-specific attributes for all of the files in the current working directory:

# showfile *


 

       Id  Vol  PgSz  Pages  XtntType  Segs  SegSz  Log  Perf  File
  22a.001    1    16      1    simple    **     **  off   50%  Mail
    7.001    1    16      1    simple    **     **  off   20%  bin
  1d8.001    1    16      1    simple    **     **  off   33%  c
 1bff.001    1    16      1    simple    **     **  off   82%  dxMail
  218.001    1    16      1    simple    **     **  off   26%  emacs
  1ed.001    1    16      0    simple    **     **  off  100%  foo
  1ee.001    1    16      1    simple    **     **  off   77%  lib
  1c8.001    1    16      1    simple    **     **  off   94%  obj
  23f.003    1    16      1    simple    **     **  off  100%  sb
 170a.008    1    16      2    simple    **     **  off   35%  t
    6.001    1    16     12    simple    **     **  off   16%  tmp

The following example of the showfile command shows the attributes and extent information for the mail file, which is a simple file:

# showfile -x mail


 

        Id  Vol  PgSz  Pages  XtntType  Segs  SegSz  Log  Perf  File
 4198.800d    2    16     27    simple    **     **  off   66%  tutorial

 

     extentMap: 1
          pageOff    pageCnt    vol    volBlock    blockCnt
                0          5      2      781552          80
                5         12      2      785776         192
               17         10      2      786800         160
       extentCnt: 3

See showfile(8) for information about the output of the command.

2.2.7.4 The showfset Command

The showfsets command displays the filesets (or clone filesets) and their characteristics in a specified domain.

The following is an example of the showfsets command:

# showfsets dmn


 

mnt
          Id           : 2c73e2f9.000f143a.1.8001
          Clone is     : mnt_clone
          Files        :       79,  limit =     1000
          Blocks  (1k) :      331,  limit =    25000
          Quota Status : user=on  group=on

 

mnt_clone
          Id           : 2c73e2f9.000f143a.2.8001
          Clone of     : mnt
          Revision     : 1

See showfsets(8) for information about the output of the command.

2.2.8 Monitoring the Logical Storage Manager (LSM)

A number of commands are available to display LSM-related information and to monitor LSM-related activity:

The voldg list, voldisk list, and volprint commands display information about LSM diskgroups, disk configurations, and the LSM configuration database.
The volstat and voltrace commands can be used to monitor activity on LSM volumes, plexes, subdisks, and disks and to trace operations on volumes.
The volwatch command monitors exception events.

In addition, you can use the Analyze menu in LSM's graphical interface (dxlsm) to monitor activity on volumes, LSM disks, and subdisks.

See the manual Logical Storage Manager for more information about monitoring LSM and for information about LSM performance management.

2.2.8.1 The voldg Command

The voldg list command displays brief information about the attributes of LSM disk groups. If you specify a particular disk group, the command displays more detailed information on the status and configuration of the specified group.

The following example uses the voldg list command to display information about the rootdg disk group:

# voldg list rootdg

Group:     rootdg
dgid:      795887625.1025.system32
import-id: 0.1
flags:
config:    seqno=0.1351 permlen=347 free=316 templen=9 loglen=52
config disk rz9 copy 1 len=347 state=clean online
config disk rz10 copy 1 len=347 state=clean online
config disk rz12 copy 1 len=347 state=clean online
config disk rz15 copy 1 len=347 state=clean online
config disk rz11 copy 1 len=347 state=clean online
config disk rz13 copy 1 len=347 state=clean online
log disk rz8 copy 1 len=200
log disk rz8 copy 2 len=200
log disk rz9 copy 1 len=52
log disk rz10 copy 1 len=52
log disk rz12 copy 1 len=52
log disk rz15 copy 1 len=52
log disk rz11 copy 1 len=52
log disk rz13 copy 1 len=52
log disk rz3 copy 1 len=200
log disk rz3 copy 2 len=200

For more information, see the voldg(8) reference page.

2.2.8.2 The voldisk Command

The voldisk list command displays the device names for all recognized disks, the disk names, the disk group names associated with each disk, and the status of each disk.

The following example uses the voldisk list command to display information about the rz15 disk:

# voldisk list rz15

Device:    rz15
devicetag: rz15
type:      sliced
hostid:    system32
disk:      name=rz15 id=795887633.1049.system32
group:     name=rootdg id=795887625.1025.system32
flags:     online ready private imported
pubpaths:  block=/dev/rz15g char=/dev/rrz15g
privpaths: block=/dev/rz15h char=/dev/rrz15h
version:   1.1
iosize:    512
public:    slice=6 offset=0 len=2697533
private:   slice=7 offset=0 len=512
update:    time=795888426 seqno=0.18
headers:   0 248
configs:   count=1 len=347
logs:      count=1 len=52
Defined regions:
 config   priv     17-   247[   231]: copy=01 offset=000000
 config   priv    249-   364[   116]: copy=01 offset=000231
 log      priv    365-   416[    52]: copy=01 offset=000000

For more information, see the voldisk(8) reference page.

2.2.8.3 The volprint Command

The volprint command displays information from records in the LSM configuration database. You can select the records to be displayed by name or using special search expressions. In addition, you can display record association hierarchies, so that the structure of records is more apparent.

Use the volprint command to display disk group, disk media, volume, plex, and subdisk records. Use the voldisk list command to display disk access records, or physical disk information.

The following example uses the volprint command to show the status of the voldev1 volume:

# volprint -ht voldev1

DG NAME        GROUP-ID
DM NAME        DEVICE       TYPE     PRIVLEN  PUBLEN   PUBPATH
V  NAME        USETYPE      KSTATE   STATE    LENGTH   READPOL  PREFPLEX
PL NAME        VOLUME       KSTATE   STATE    LENGTH   LAYOUT   ST-WIDTH MODE
SD NAME        PLEX         PLOFFS   DISKOFFS LENGTH   DISK-NAME    DEVICE

 

v  voldev1     fsgen        ENABLED  ACTIVE   804512   SELECT   -
pl voldev1-01  voldev1      ENABLED  TEMP     804512   CONCAT   -        WO
sd rz8-01      voldev1-01   0        0        804512   rz8          rz8
pl voldev1-02  voldev1      ENABLED  ACTIVE   804512   CONCAT   -        RW
sd dev1-01     voldev1-02   0        2295277  402256   dev1         rz9
sd rz15-02     voldev1-02   402256   2295277  402256   rz15         rz15

For more information, see the volprint(8) reference page.

2.2.8.4 The volstat Command

The volstat command provides information about activity on volumes, plexes, subdisks, and disks under LSM control. It reports statistics that reflect the activity levels of LSM objects since boot time.

The amount of information displayed depends on what options you specify to volstat. For example, you can display statistics for a specific LSM object, or you can display statistics for all objects at one time. You can also specify a disk group, in which case, only statistics for objects in that disk group are displayed; if you do not specify a particular disk group, volstat displays statistics for the default disk group (rootdg).

The volstat command can also be used to reset the statistics information to zero. This can be done for all objects or for only specified objects. Resetting just prior to a particular operation makes it possible to measure the subsequent impact of that particular operation.

The following example shows statistics on LSM volumes.

# volstat

OPERATIONS       BLOCKS        AVG TIME(ms)
TYP NAME        READ   WRITE    READ    WRITE   READ   WRITE
vol archive      865     807    5722     3809   32.5    24.0
vol home        2980    5287    6504    10550   37.7   221.1
vol local      49477   49230  507892   204975   28.5    33.5
vol src        79174   23603  425472   139302   22.4    30.9
vol swapvol    22751   32364  182001   258905   25.3   323.2

For more information, see the volstat(8) reference page.

2.2.8.5 The voltrace Command

The voltrace command reads an event log (/dev/volevent) and prints formatted event log records to standard output. Using voltrace, you can set event trace masks to determine which type of events will be tracked. For example, you can trace I/O events, configuration changes, or I/O errors.

The following sample voltrace command shows status on all new events.

# voltrace -n -e all

18446744072623507277 IOTRACE 439: req 3987131 v:rootvol p:rootvol-01 \
  d:root_domain s:rz3-02 iot write lb 0 b 63120 len 8192 tm 12
18446744072623507277 IOTRACE 440: req 3987131 \
  v:rootvol iot write lb 0 b 63136 len 8192 tm 12

For more information, see the voltrace(8) reference page.

2.2.8.6 The volwatch Command

The volwatch command monitors LSM for failure events and sends mail to the specified user.

For more information, see the volwatch(8) reference page.

2.2.8.7 The Analyze Menu

LSM's graphical interface (dxlsm) includes an Analyze menu. The Analyze menu allows you to display statistics about volumes, LSM disks, and subdisks. The information is displayed graphically, using colors and patterns on the disk icons, and numerically, using the Analysis Statistics form. You can use the Analysis Parameters form to tailor the information that will be displayed.

See the manual Logical Storage Manager for information about dxlsm.

2.2.9 Monitoring System Parameter Settings

System parameters are global variables. You can monitor the setting of these variables by using the Kernel Tuner or the sysconfig command. Also, as explained in Section 2.2.10, you can also do this monitoring with dbx.

2.2.9.1 Using the Kernel Tuner to Monitor Settings

The Kernel Tuner (dxkerneltuner) is provided by the Common Desktop Environment's (CDE) graphical user interface. To access the Kernel Tuner, click on the Application Manager icon in the CDE menu bar and then select the Monitoring/Tuning category. When you then select the Kernel Tuner, a pop-up containing a list of subsystems appears. Selecting a subsystem generates a display of the subsystem's attributes and their values. See Appendix B for descriptions of the attributes displayed by the Kernel Tuner or the sysconfig command.

2.2.9.2 Using the sysconfig Command to Monitor Settings

The sysconfig command is part of a system utility that allows you to modify most of the global variables that affect system performance without needing to rebuild the kernel to put the new values permanently in effect. (Section 3.3 explains how to modify global variables in this way.)

The sysconfig -q command monitors the values of attributes. Each attribute corresponds to a global variable; however, not all global variables have corresponding attributes (that is, only a subset of the global variables in a system have corresponding attributes). Attribute names usually differ slightly from the names that are used for their corresponding global variables in the system configuration file (/usr/sys/conf/system_name) and the param.c file (/usr/sys/system_name/param.c), but they are always very similar.

To examine the current setting of a particular global variable, issue a sysconfig command with the name of the subsystem that owns the variable and the name of the attribute that corresponds to the particular variable:

sysconfig -q subsystem_name [ attribute_name ]

If you omit attribute_name, the values for all of the attributes for the named subsystem are displayed.

Use the following command to list the subsystem names that you can specify in a sysconfig command:

sysconfig -s

For example:

# sysconfig -s

Cm: loaded and configured
Generic: loaded and configured
Proc: loaded and configured

.
.
.

Xpr: loaded and configured
Rt: loaded and configured
Net: loaded and configured

#

Use the following command to list the values of all of the attributes associated with a particular subsystem:

sysconfig -q subsystem_name

For example:

# sysconfig -q vm

ubc-minpercent = 10
ubc-maxpercent = 100

.
.
.

vm-syswiredpercent = 80
vm-inswappedmin = 1

# sysconfig -q vfs

name-cache-size = 1029
name-cache-hash-size = 256

.
.
.

max-ufs-mounts = 1000
vnode-deallocation-enable = 1
#

Note that a global variable's value in the system configuration file or the param.c file can differ from the value assigned to the global variable's attribute established in the sysconfigtab file (/etc/sysconfigtab) by the sysconfigdb command or in a running kernel by the sysconfig -r command. In a running system, values established by the sysconfig -r command override values established in the sysconfigtab file, and values in the sysconfigtab file override values in the system configuration file or the param.c file.

If an attribute is not defined in the sysconfigtab file, the sysconfig -q command returns the value of the corresponding parameter in the system configuration file or param.c.

To display the minimum and maximum values that can be given to attributes, issue the following command:

sysconfig -Q subsystem_name [ attribute_list ]

See sysconfig(8) or the Kernel Debugging and Configuration Management Guide for details on the sysconfig command. See Section 3.3 for information on how to tune the values of configuration attributes using the sysconfigdb and sysconfig -r commands.

For descriptions of the configuration attributes that have an effect on system performance, see Appendix B. Note that not all subsystems displayed by a sysconfig -r command are covered in Appendix B. Only those subystems that have tunable attributes affecting performance are covered.

2.2.10 Using dbx to Monitor Subsystems

You can use dbx to examine source files, control program execution, display the state of the program, and debug at the machine-code level. To examine the values of variables and data structures, use the dbx print command.

To examine a running system with dbx, issue the following command:

# dbx -k /vmunix /dev/mem

The following sections describe how to use dbx to examine various subsystems of the Digital UNIX operating system.

2.2.10.1 Checking Virtual Memory with dbx

You can check virtual memory by using dbx and examining the vm_perfsum structure. Note the vpf_pagefaults field (number of hardware page faults) and the vpf_swapspace field (number of pages of swap space not reserved):

(dbx) p vm_perfsum

struct {
	vpf_pagefaults = 6732100

.
.
.

	vpf_swapspace = 29230
}
(dbx)

See Section 3.4 for information on how to tune the virtual memory subsystem.

2.2.10.2 Checking UFS with dbx

To check UFS using dbx, examine the ufs_clusterstats structure to see how efficiently the system is performing cluster read and write transfers. You can examine the cluster reads and writes separately with the ufs_clusterstats_read and ufs_clusterstats_write structures.

The following example shows a system that is not clustering efficiently:

(dbx) p ufs_clusterstats

struct {
    full_cluster_transfers = 3130
    part_cluster_transfers = 9786
    non_cluster_transfers = 16833
    sum_cluster_transfers = {
        [0] 0
        [1] 24644
        [2] 1128
        [3] 463
        [4] 202
        [5] 55
        [6] 117
        [7] 36
        [8] 123
        [9] 0
    }
}
(dbx)

The preceding example shows 24644 single-block transfers and no 9-block transfers. The trend of the data shown in the example is the reverse of what you want to see. It shows a large number of single-block transfers and a declining number of multiblock (1 - 9) transfers. However, if the files are all small, this may be the best blocking that you can achieve.

See Section 3.6.1.2 for information on how to tune the UFS file system.

2.2.10.3 Checking the UFS Namei Cache with dbx

The UFS namei cache stores recently used file system pathname/inode number pairs. It also stores inode information for files that were referenced but not found. Having this information in the cache substantially reduces the amount of searching that is needed to perform pathname translations.

To check the namei cache, use dbx and look at the nchstats data structure. In particular, look at the ncs_goodhits, ncs_neghits, and ncs_misses fields to determine the hit rate. The hit rate should be above 80 percent ( ncs_goodhits plus ncs_neghits divided by the sum of the ncs_goodhits, ncs_neghits, and ncs_misses).

For example:

(dbx) p nchstats

struct {
    ncs_goodhits = 9748603   -found a pair
    ncs_neghits = 888729     -found a pair that didn't exist
    ncs_badhits = 23470
    ncs_falsehits = 69371
    ncs_miss = 1055430       -did not find a pair
    ncs_long = 4067          -name was too long to fit in the cache
    ncs_pass2 = 127950
    ncs_2passes = 195763
    ncs_dirscan = 47
}
(dbx)

For information on how to improve the namei cache hit rate, see Section 3.6.1. For information on how to improve namei cache lookup speeds, see Section 3.4.1.3.

2.2.10.4 Checking the UBC with dbx

To check the UBC, use dbx to examine the vm_perfsum structure. In particular, look at the vpf_piowrites field (number of I/O operations for page outs generated by the page stealing daemon) and the vpf_ubcalloc field (number of times the UBC had to allocate a page from the virtual memory free page list to satisfy memory demands). For example:

(dbx) p vm_perfsum

struct {
    vpf_pagefaults = 6732100
    vpf_kpagefaults = 119865
    vpf_cowfaults = 926159
    vpf_cowsteals = 192703
    vpf_zfod = 2720195
    vpf_kzfod = 119865
    vpf_pgiowrites = 1882
    vpf_pgwrites = 4747
    vpf_pgioreads = 1874108
    vpf_pgreads = 1412
    vpf_swapreclaims = 4
    vpf_taskswapouts = 0
    vpf_taskswapins = 0
    vpf_vplmsteal = 1411
    vpf_vplmstealwins = 1365
    vpf_vpseqdrain = 0
    vpf_ubchit = 3851
    vpf_ubcalloc = 103378
    vpf_ubcpushes = 0
    vpf_ubcpagepushes = 0
    vpf_ubcdirtywra = 0
    vpf_ubcreclaim = 0
    vpf_reactivate = 1973
    vpf_allocatedpages = 16177
    vpf_wiredpages = 2805
    vpf_ubcpages = 5494
    vpf_freepages = 3384
    vpf_swapspace = 29230
}
(dbx)

The vpf_ubcpages field gives the number of pages of physical memory that the UBC is using to cache file data. If the UBC is using significantly more than half of physical memory and the paging rate (vpf_pgiowrites field) is high, you should probably reduce ubc-maxpercent to 50 percent. This should cause a decrease in the paging activity.

You can also monitor the UBC by examining the ufs_getapage_stats kernel data structure. You can calculate the hit rate by dividing the value for read_hits by the value for read_looks. A good hit rate is a rate above 95 percent.

(dbx) p ufs_getapage_stats

struct {
    read_looks = 2059022
    read_hits = 2022488
    read_miss = 36506
}
(dbx)

In addition, you can check the UBC by examining the vm_tune structure and the vt_ubcseqpercent and vt_ubcseqstartpercent fields. These values are used to prevent a large file from completely filling the UBC, thus limiting the amount of memory available to the virtual memory subsystem.

For example:

(dbx) p vm_tune

struct {
    vt_cowfaults = 4
    vt_mapentries = 200
    vt_maxvas = 1073741824
    vt_maxwire = 16777216
    vt_heappercent = 7
    vt_anonklshift = 17
    vt_anonklpages = 1
    vt_vpagemax = 16384
    vt_segmentation = 1
    vt_ubcpagesteal = 24
    vt_ubcdirtypercent = 10
    vt_ubcseqstartpercent = 50
    vt_ubcseqpercent = 10
    vt_csubmapsize = 1048576
    vt_ubcbuffers = 256
    vt_syncswapbuffers = 128
    vt_asyncswapbuffers = 4
    vt_clustermap = 1048576
    vt_clustersize = 65536
    vt_zone_size = 0
    vt_kentry_zone_size = 16777216
    vt_syswiredpercent = 80
    vt_inswappedmin = 1
}

When copying large files, the source and destination objects in the UBC will grow very large (up to all of available physical memory). Reducing the value of vt_ubcseqpercent decreases the number of pages that will be used to cache sequentially accessed files (that is, files being moved in memory). The value represents the percent of memory that a sequentially accessed file can grow to before it starts stealing memory from itself. The value imposes a resident set size limit on a file.

See Section 3.4.1 for information on how to tune the UBC.

2.2.10.5 Checking the Metadata Buffer Cache with dbx

The metadata buffer cache contains file metadata - superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries. To check the metadata buffer cache, use dbx to examine the bio_stats structure:

(dbx) p bio_stats

struct {
    getblk_hits = 4590388
    getblk_misses = 17569
    getblk_research = 0
    getblk_dupbuf = 0
    getnewbuf_calls = 17590
    getnewbuf_buflocked = 0
    vflushbuf_lockskips = 0
    mntflushbuf_misses = 0
    mntinvalbuf_misses = 0
    vinvalbuf_misses = 0
    allocbuf_buflocked = 0
    ufssync_misses = 0
}
(dbx)

If the miss rate is high, you may want to raise the value of the bufcache attribute. The number of block misses (getblk_misses) divided by the sum of block misses and block hits (getblk_hits) should not be more than 3 percent.

See Section 3.4.1.3 for information on how to tune the metadata buffer cache.

2.2.10.6 Monitoring CAM Data Structures with dbx

The operating system uses the Common Access Method (CAM) as the operating system interface to the hardware. CAM maintains the xpt_qhead, ccmn_bp_head, and xpt_cb_queue data structures:

xpt_qhead - Contains information regarding the current size of the buffer pool free list (xpt_nfree), the current number of processes waiting for buffer (xpt_wait_cnt), and the total number of times that processes had to wait for free buffers (xpt_times_wait).
ccmn_bp_head - Provides statistics on the buffer structure pool. This pool is used for raw I/O to disk. Some spreadsheet and database applications with their own file system use the raw device instead of UFS. The information provided is the current size of the buffer structure pool (num_bp) and the wait count for buffers (bp_wait_cnt).
xpt_cb_queue - Contains the actual link-list of the I/O operations that have been completed and are waiting to be passed back to the peripheral drivers (cam_disk or cam_tape, for example).

Use dbx to examine the three structures:

(dbx) p xpt_qhead

struct {
    xws = struct {
        x_flink = 0xffffffff81f07400
        x_blink = 0xffffffff81f03000
        xpt_flags = 2147483656
        xpt_ccb = (nil)
        xpt_nfree = 300
        xpt_nbusy = 0
    }
    xpt_wait_cnt = 0
    xpt_times_wait = 2
    xpt_ccb_limit = 1048576
    xpt_ccbs_total = 300
    x_lk_qhead = struct {
        sl_data = 0
        sl_info = 0
        sl_cpuid = 0
        sl_lifms = 0
    }
}

(dbx) p ccmn_bp_head

struct {
    num_bp = 50
    bp_list = 0xffffffff81f1be00
    bp_wait_cnt = 0
}

(dbx) p xpt_cb_queue

struct {
    flink = 0xfffffc00004d6828
    blink = 0xfffffc00004d6828
    flags = 0
    initialized = 1
    count = 0
    cplt_lock = struct {
        sl_data = 0
        sl_info = 0
        sl_cpuid = 0
        sl_lifms = 0
    }
}
(dbx)

If the values for xpt_wait_cnt or bp_wait_cnt are nonzero, CAM has run out of buffer pool space. If this situation persists, you may be able to eliminate the problem by changing one or more of CAM's I/O attributes (see Section B.8).

The count parameter in xpt_cb_queue is the number of I/O operations that have been completed and are ready to be passed back to a peripheral device driver. Normally, the value of count should be zero or one. If greater than one, it could indicate either a problem or a temporary situation in which a large number of I/O operations are completing simultaneously. If repeated testing demonstrates that the value is consistently greater than one, one or more subsystem components may require tuning.

2.2.11 Monitoring the Network - netstat Command

To check network statistics, use the netstat command (or nfsstat command, see Section 2.2.12). Some problems to look for are as follows:

If netstat -i shows excessive amounts of input errors (Ierrs), output errors (Oerrs), or collisions (Coll), this could indicate a network problem, for example, cables not connected properly or Ethernet saturation.
If the netstat -m command shows several requests for memory delayed or denied, this means that your system had temporarily run short of physical memory.
If the netstat -m command shows that the number of network threads configured in your system exceeds the peak number of concurrently active threads, your system may be consuming system memory unnecessarily. (The number of network threads can be reduced by modifying the netisrthreads attribute in the sysconfigtab file.)

Most of the information provided by netstat is used to diagnose network hardware or software failures, not to analyze tuning opportunities. See the manual Network Administration for additional information on how to diagnose failures.

The following example shows the output produced by the -i option of the netstat command:

# netstat -i

Name  Mtu   Network     Address         Ipkts Ierrs    Opkts Oerrs  Coll
ln0   1500  DLI         none           133194     2    23632     4  4881
ln0   1500  <Link>                     133194     2    23632     4  4881
ln0   1500  red-net     node1          133194     2    23632     4  4881
sl0*  296   <Link>                          0     0        0     0     0
sl1*  296   <Link>                          0     0        0     0     0
lo0   1536  <Link>                        580     0      580     0     0
lo0   1536  loop        localhost         580     0      580     0     0

Use the following command to determine the causes of the input (Ierrs) and output (Oerrs) shown in the preceding example:

# netstat -is


 

ln0 Ethernet counters at Fri Jan 14 16:57:36 1994

 

        4112 seconds since last zeroed
    30307093 bytes received
     3722308 bytes sent
      133245 data blocks received
       23643 data blocks sent
    14956647 multicast bytes received
      102675 multicast blocks received
       18066 multicast bytes sent
         309 multicast blocks sent
        3446 blocks sent, initially deferred
        1130 blocks sent, single collision
        1876 blocks sent, multiple collisions
           4 send failures, reasons include:
                Excessive collisions
           0 collision detect check failure
           2 receive failures, reasons include:
                Block check error
                Framing Error
           0 unrecognized frame destination
           0 data overruns
           0 system buffer unavailable
           0 user buffer unavailable

The -s option for the netstat command displays statistics for each protocol:

# netstat -s

ip:
        67673 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        0 with data size < data length
        0 with header length < data size
        0 with data length < header length
        8616 fragments received
        0 fragments dropped (dup or out of space)
        5 fragments dropped after timeout
        0 packets forwarded
        8 packets not forwardable
        0 redirects sent
icmp:
        27 calls to icmp_error
        0 errors not generated 'cuz old message was icmp
        Output histogram:
                echo reply: 8
                destination unreachable: 27
        0 messages with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Input histogram:
                echo reply: 1
                destination unreachable: 4
                echo: 8
        8 message responses generated
igmp:
        365 messages received
        0 messages received with too few bytes
        0 messages received with bad checksum
        365 membership queries received
        0 membership queries received with invalid field(s)
        0 membership reports received
        0 membership reports received with invalid field(s)
        0 membership reports received for groups to which we belong
        0 membership reports sent
tcp:
        11219 packets sent
                7265 data packets (139886 bytes)
                4 data packets (15 bytes) retransmitted
                3353 ack-only packets (2842 delayed)
                0 URG only packets
                14 window probe packets
                526 window update packets
                57 control packets
        12158 packets received
                7206 acks (for 139930 bytes)
                32 duplicate acks
                0 acks for unsent data
                8815 packets (1612505 bytes) received in-sequence
                432 completely duplicate packets (435 bytes)
                0 packets with some dup. data (0 bytes duped)
                14 out-of-order packets (0 bytes)
                1 packet (0 bytes) of data after window
                0 window probes
                1 window update packet
                5 packets received after close
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
        19 connection requests
        25 connection accepts
        44 connections established (including accepts)
        47 connections closed (including 0 drops)
        3 embryonic connections dropped
        7217 segments updated rtt (of 7222 attempts)
        4 retransmit timeouts
                0 connections dropped by rexmit timeout
        0 persist timeouts
        0 keepalive timeouts
                0 keepalive probes sent
                0 connections dropped by keepalive
udp:
        12003 packets sent
        48193 packets received
        0 incomplete headers
        0 bad data length fields
        0 bad checksums
        0 full sockets
        12943 for no port (12916 broadcasts, 0 multicasts)

See netstat(1) for information about the output produced by the various options supported by the netstat command.

2.2.12 Displaying NFS Statistics - nfsstat Command

To check NFS statistics, use the nfsstat command. For example:

# nfsstat


 

Server rpc:
calls     badcalls  nullrecv   badlen   xdrcall
38903     0         0          0        0

 

Server nfs:
calls     badcalls
38903     0

 

Server nfs V2:
null      getattr   setattr    root     lookup     readlink   read
5  0%     3345  8%  61  0%     0  0%    5902 15%   250  0%    1497  3%
wrcache   write     create     remove   rename     link       symlink
0  0%     1400  3%  549  1%    1049  2% 352  0%    250  0%    250  0%
mkdir     rmdir     readdir    statfs
171  0%   172  0%   689  1%    1751  4%

 

Server nfs V3:
null      getattr   setattr    lookup    access    readlink   read
0  0%     1333  3%  1019  2%   5196 13%  238  0%   400  1%    2816  7%
write     create    mkdir      symlink   mknod     remove     rmdir
2560  6%  752  1%   140  0%    400  1%   0  0%     1352  3%   140  0%
rename    link      readdir    readdir+  fsstat    fsinfo     pathconf
200  0%   200  0%   936  2%    0  0%     3504  9%  3  0%      0  0%
commit
21  0%

 

Client rpc:
calls     badcalls  retrans    badxid    timeout   wait       newcred
27989     1         0          0         1         0          0
badverfs  timers
0         4

 

Client nfs:
calls     badcalls  nclget     nclsleep
27988     0         27988      0

 

Client nfs V2:
null      getattr   setattr    root      lookup    readlink   read
0  0%     3414 12%  61  0%     0  0%     5973 21%  257  0%    1503  5%
wrcache   write     create     remove    rename    link       symlink
0  0%     1400  5%  549  1%    1049  3%  352  1%   250  0%    250  0%
mkdir     rmdir     readdir    statfs
171  0%   171  0%   713  2%    1756  6%

 

Client nfs V3:
null      getattr   setattr    lookup    access    readlink   read
0  0%     666  2%   9  0%      2598  9%  137  0%   200  0%    1408  5%
write     create    mkdir      symlink   mknod     remove     rmdir
1280  4%  376  1%   70  0%     200  0%   0  0%     676  2%    70  0%
rename    link      readdir    readdir+  fsstat    fsinfo     pathconf
100  0%   100  0%   468  1%    0  0%     1750  6%  1  0%      0  0%
commit
10  0%

#

The ratio of timeouts to calls (which should not exceed 1 percent) is the most important thing to look for in the NFS statistics. A timeout-to-call ratio greater than 1 percent can have a significant negative impact on performance. See Section 3.6.3 for information on how to tune your system to avoid timeouts.

If you are attempting to monitor an experimental situation with nfsstat, it may be advisable to reset the NFS counters to zero before you begin the experiment. The nfsstat -z command can be used to clear the counters.