Before you start to monitor your system to identify a performance problem, you should understand your user environment, the applications you are running and how they use the various subsystems, and what is acceptable performance.
The source of the performance problem may not be obvious. For example, if your disk I/O subsystem is swamped with activity, the problem may be in either the virtual memory subsystem or the disk I/O subsystem. In general, obtain as much information as possible about the system before you attempt to tune it.
In addition, how you decide to tune your system depends on how your users and applications utilize the system. For example, if you are running CPU-intensive applications, the virtual memory subsystem may be more important than the unified buffer cache (UBC).
This chapter contains the following information:
Numerous system monitoring tools are available. You may have to use various tools in combination with each other in order to get an accurate picture of your system. In addition to obtaining information about your system when it is running poorly, it is also important for you to obtain information about your system when it is running well. By comparing the two sets of data, you may be able to pinpoint the area that is causing the performance problem.
The primary monitoring tools are described in
Table 2-1.
Other tools can also provide you with important monitoring information. These secondary monitoring tools are described in Table 2-2.
Tool | Description |
atom |
Serves as a general-purpose framework for creating sophisticated
program analysis tools.
It includes numerous unsupported prepackaged tools and the following
supported tools:
third,
hiprof,
and
pixie.
The third tool performs memory access checks and detects memory leaks in an application. The hiprof tool produces either a flat or hierarchical profile of an application. The flat profile shows the execution time spent in a given procedure, and the hierarchical profile shows the execution time spent in a given procedure and all of its descendents. The pixie tool partitions an application into basic blocks and counts the number of times each basic block is executed. For details, see the Programmer's Guide or atom(1). |
dbx | Analyzes running kernels and dump files. The dbx command invokes a source-level debugger. You can use dbx with code produced by the cc and as compilers and with machine code. After invoking the dbx debugger, you issue dbx commands that allow you to examine source files, control program execution, display the state of the program, and debug at the machine-code level. To analyze kernels, use the -k option. See Section 2.2.9 for more information on using the dbx command to diagnose system performance problems. |
dumpfs | Displays UFS file system information. This command is useful for getting information about the file system block and fragment size and the minimum free space percentage. See Section 2.2.6 for more information on using the dumpfs command to diagnose system performance problems. |
gprof |
Displays call graph profile data showing the effects of called routines.
Similar to the
prof
utility.
For details, see the Programmer's Guide or gprof(1). |
ipcs |
Reports interprocess communication (IPC) statistics. The
ipcs
command displays information about currently active message queues,
shared-memory segments, semaphores, remote queues, and local queue
headers.
Information provided in the following fields by the
ipcs
-a
command can be especially useful:
-
QNUM,
the number of messages currently outstanding in the
-
CBYTES,
the number of bytes in messages currently
-
QBYTES,
the maximum number of bytes allowed in messages
- SEGSZ, the size of the associated shared memory segment
-
NSEMS,
the number of semaphores in the set associated with
See ipcs(1) for details. |
kdbx |
Analyzes running kernels and dump files. The
kdbx
debugger is an interactive program that lets you examine either the
running kernel or dump files created by the
savecore
utility. In either case, you will be examining an object file and a
core file. For running systems, these files are usually
/vmunix
and
/dev/mem,
respectively. Dump files created by
savecore
are saved in the directory specified by the
/sbin/init.d/savecore
script which is, by default,
/var/adm/crash.
All
dbx
commands are available in
kdbx
using the
dbx
option.
See the manual Kernel Debugging or kdbx(8) for details. |
kprofile | Profiles the kernel using the performance counters in the hardware. See the manual Kernel Debugging or kprofile(1) for details. |
nfswatch | Monitors all NFS network traffic and divides it into several categories. The number and percentage of packets received in each category appears on the screen in a continuously updated display. Your kernel must be configured with the packetfilter option. See nfswatch(8) and packetfilter(7) for details. |
pixie | Provides basic block counting data when used with prof. |
prof | Displays statistics on where time is being spent - at the routine level, basic block level, or instruction level - during the execution of a program. This information will help you to determine where to concentrate your efforts to optimize source code. |
showfdmn | Displays the attributes of an AdvFS file domain and detailed information about each volume in the file domain. |
showfile | Displays the full storage allocation map (extent map) for files in an Advanced File System (AdvFS). An extent is a contiguous area of disk space that the file system allocates to a file. |
showfsets | Displays the filesets (or clone filesets) and their characteristics in a specified domain. |
swapon | Specifies additional disk space for paging and swapping and displays swap space utilization, including the total amount of allocated swap space, the amount of swap space that is being used, and the amount of free swap space. See Section 2.2.4 for more information on using the swapon command to diagnose system performance. |
tcpdump | Displays network traffic. The tcpdump command prints out the headers of packets on a network interface that match the Boolean expression. Your kernel must be configured with the packetfilter option. See tcpdump(8) and packetfilter(7) for details. |
uprofile | Profiles user code using performance counters in the hardware. See uprofile(1) for details. |
voldg | Displays, with the list option, information about an LSM diskgroup's attributes. See voldg(8) for details. |
voldisk | Displays, with the list option, a disk's configuration and attribute information. See voldisk(8) for details. |
volprint | Displays information from records in the LSM configuration database. See volprint(8) for more information. |
volstat | Displays Logical Storage Manager statistics for LSM volumes, plexes, subdisks, or disks. See volstat(8) for details. |
voltrace | Prints records from an event log. Sets event trace masks to determine what type of events will be tracked. See voltrace(8) for more information. |
volwatch | Monitors LSM for failure events and sends mail to the specified user. See volwatch(8) for more information. |
w | Displays a summary of current system activity. The system summary shows the current time, the amount of time since the system was last started, the number of users logged in to the system, and the load averages. The load average numbers give the number of jobs in the run queue for the last 5 seconds, the last 30 seconds, and the last 60 seconds. See w(1) for details. |
xload | Displays the system load average for X. The xload command displays a periodically updating histogram of the system load average. See xload(1X) for details. |
POLYCENTER Performance Solution, a layered product, is also available as a monitoring tool. It can monitor many Digital UNIX nodes simultaneously. A single-node version of the product is included with the operating system at no extra charge.
POLYCENTER Performance Solution has a graphical user interface (GUI) called Performance Manager. Performance Manager is a real-time performance monitor that allows you to detect and correct performance problems. Graphs and charts can show hundreds of different system values, including CPU performance, memory usage, disk transfers, file-system capacity, network efficiency, and AdvFS and cluster-specific metrics.
Thresholds can be set to alert you to or correct a problem when it occurs, and archives of data can be kept for high-speed playback or long-term trend analysis.
Performance Manager has performance analysis and system management scripts, as well as cluster-specific and AdvFS-specific scripts. These scripts can be run simultaneously on multiple nodes from the GUI.
Performance Manager automatically discovers cluster members when a single cluster member node is specified, and it can monitor both individual cluster members and an entire cluster concurrently.
For details on POLYCENTER Performance Solution, see the manual POLYCENTER Performance Solution for UNIX Systems: User's Guide.
The following sections describe how to use monitoring tools to identify the system component or subsystem that is causing a performance degradation. Once you determine which subsystem or component is causing the problem and you are sure that you understand your system environment and the needs of your users, refer to the appropriate section in Chapter 3 for information on tuning the particular subsystem or component.
The ps command displays the current status of the system processes. You can use it to determine the current running processes, their state, and how they utilize system memory. The command lists processes in order of decreasing CPU usage, so you can easily determine which processes are using the most CPU time. Be aware that ps is only a snapshot of the system; by the time the command finishes executing, the system state has probably changed. For example, one of the first lines of the command may refer to the ps command itself.
An example of the ps command follows:
# ps aux
USER PID %CPU %MEM VSZ RSS TTY S STARTED TIME COMMAND chen 2225 5.0 0.3 1.35M 256K p9 U 13:24:58 0:00.36 cp /vmunix /tmp root 2236 3.0 0.5 1.59M 456K p9 R + 13:33:21 0:00.08 ps aux sorn 2226 1.0 0.6 2.75M 552K p9 S + 13:25:01 0:00.05 vi met.ps root 347 1.0 4.0 9.58M 3.72 ?? S Nov 07 01:26:44 /usr/bin/X11/X -a root 1905 1.0 1.1 6.10M 1.01 ?? R 16:55:16 0:24.79 /usr/bin/X11/dxpa sorn 2228 0.0 0.5 1.82M 504K p5 S + 13:25:03 0:00.02 more sorn 2202 0.0 0.5 2.03M 456K p5 S 13:14:14 0:00.23 -csh (csh) root 0 0.0 12.7 356M 11.9 ?? R < Nov 07 3-17:26:13 [kernel idle] [1] [2] [3] [4] [5] [6]
The ps command includes the following information that you can use to diagnose CPU and virtual memory problems:
From the output of the ps command, you can determine which processes are consuming most of your system's CPU time and memory and whether processes are swapped out. Concentrate on processes that are runnable or paging. Here are some concerns to keep in mind:
If a process using a large amount of CPU time is running correctly, you may want to lower its priority with either the nice or renice command. Note that these commands have no effect on memory usage by a process.
For information about memory tuning, see Section 3.4. For information about improving the performance of your applications, see the Programmer's Guide.
The uptime command shows how long a system has been running and the load average. The load average counts jobs that are waiting for disk I/O and also applications whose priorities have been changed with either the nice or renice command. The load average numbers give the average number of jobs in the run queue for the last 5 seconds, the last 30 seconds, and the last 60 seconds.
An example of the uptime command follows:
#
uptime
1:48pm up 7 days, 1:07, 35 users, load average: 7.12, 10.33, 10.31
Note whether the load is increasing or decreasing. An acceptable load average depends on your type of system and how it is being used. In general, for a large system, a load of 10 is high, and a load of 3 is low. Workstations should have a load of 1 or 2. If the load is high, look at what processes are running with the ps command. You may want to run some applications during off-peak hours. You can also lower the priority of applications with the nice or renice command to conserve CPU cycles.
See Section 3.2 for additional information on how to reduce the load on your system.
The vmstat command shows the virtual memory, process, and total CPU statistics for a specified time interval. The first line of the output is for all time since a reboot, and each subsequent report is for the last interval. Because the CPU operates faster than the rest of the system, performance bottlenecks usually exist in the memory or I/O subsystems.
An example of the
vmstat
command follows:
# vmstat 1
Virtual Memory Statistics: (pagesize = 8192) procs memory pages intr cpu r w u act free wire fault cow zero react pin pout in sy cs us sy id 2 66 25 6417 3497 1570 155K 38K 50K 0 46K 0 4 290 165 0 2 98 4 65 24 6421 3493 1570 120 9 81 0 8 0 585 865 335 37 16 48 2 66 25 6421 3493 1570 69 0 69 0 0 0 570 968 368 8 22 69 4 65 24 6421 3493 1570 69 0 69 0 0 0 554 768 370 2 14 84 4 65 24 6421 3493 1570 69 0 69 0 0 0 865 1K 404 4 20 76 [1] [2] [3] [4]
The vmstat command includes information that you can use to diagnose CPU and virtual memory problems. The following fields are particularly important:
While diagnosing a bottleneck situation, keep the following issues in mind:
You must understand how your applications use the system to determine the appropriate values for these times. The goal is to keep the CPU as productive as possible. Idle CPU cycles occur when no runnable processes exist or when the CPU is waiting to complete an I/O or memory request.
The following list presents information on how to interpret the values for user, idle, and system time:
Note that a high system time and low idle time could be caused by failing hardware. Use the uerf command to check your hardware.
A high system time could also indicate that the system is thrashing; that is, the amount of memory available to the virtual memory subsystem has gotten so low that the system is spending all its time paging and swapping in an attempt to regain memory. A system that spends more than 50 percent of its time in system mode or idle mode may be doing a lot of I/O, so this could indicate a virtual memory problem.
If you have a high idle time and you are sure that your system has a typical load, one or more of the following problems may exist: the hardware may be saturated (bus bandwidth, arm motion, CPU cycles, cache thrashing), one or more kernel data structures is being exhausted, or you may have a hardware or kernel resource block such as an application, I/O, or network bottleneck.
See Chapter 3 for information on improving CPU usage and I/O operations and for information on tuning virtual memory, disks, and file systems.
Use the swapon command with the -s option to display your swap device configuration. For each swap partition, the command displays the total amount of allocated swap space, the amount of swap space that is being used, and the amount of free swap space. This information should help you determine how your swap space is being utilized. For example:
#
swapon -s
Swap partition /dev/rz2b (default swap): Allocated space: 16384 pages (128MB) In-use space: 1 pages ( 0%) Free space: 16383 pages ( 99%)
Swap partition /dev/rz12c: Allocated space: 128178 pages (1001MB) In-use space: 1 pages ( 0%) Free space: 128177 pages ( 99%)
Total swap allocation: Allocated space: 144562 pages (1129MB) Reserved space: 2946 pages ( 2%) In-use space: 2 pages ( 0%) Available space: 141616 pages ( 97%)
See Section 3.4.2.1 for information on how to tune your swap space configuration. Use the iostat command to determine which disks are being used the most.
The iostat command reports I/O statistics for terminals, disks, and the CPU. The first line of the output is the average since boot time, and each subsequent report is for the last interval. An example of the iostat command is as follows:
#
iostat 1
tty rz1 rz2 rz3 cpu tin tout bps tps bps tps bps tps us ni sy id 0 3 3 1 0 0 8 1 11 10 38 40 0 58 0 0 0 0 0 0 46 4 50 0 0 58 0 0 0 0 0 0 68 0 32 0 0 58 0 0 0 0 0 0 55 2 42 0
The iostat command reports I/O statistics that you can use to diagnose disk I/O performance problems. For example, the command displays information about the following:
Note the following when you use the
iostat
command:
See Section 3.6 for information on how to improve your disk I/O performance.
The dumpfs command dumps UFS information. The command prints out the super block and cylinder group information. The command is useful for getting information about the file system block and fragment sizes and the minimum free space percentage.
The following example shows part of the output of the dumpfs command:
#
dumpfs /dev/rrz3g | more
magic 11954 format dynamic time Tue Sep 14 15:46:52 1993 nbfree 21490 ndir 9 nifree 99541 nffree 60 ncg 65 ncyl 1027 size 409600 blocks 396062 bsize 8192 shift 13 mask 0xffffe000 fsize 1024 shift 10 mask 0xfffffc00 frag 8 shift 3 fsbtodb 1 cpg 16 bpg 798 fpg 6384 ipg 1536 minfree 10% optim time maxcontig 8 maxbpg 2048 rotdelay 0ms headswitch 0us trackseek 0us rps 60
The information contained in the first lines are relevant for tuning. Of specific interest are the following fields:
Keep the following issues in mind:
For information about tuning UFS file system configuration parameters and sysconfigtab configuration attributes to improve your disk I/O performance, see Section 3.6.1.2.
You can use the advscan, showfdmn, showfile, and showfsets commands to display information about AdvFS.
See Section 3.6.1.3 for information about tuning AdvFS.
The advscan command locates pieces of AdvFS domains on disk partitions and in LSM disk groups. Use the advscan command when you have moved disks to a new system, have moved disks around in a way that has changed device numbers, or have lost track of where the domains are. The command is also used for repair if you delete /etc/fdmns, delete a directory domain under /etc/fdmns, or delete some links from a domain directory under /etc/fdmns.
The advscan command accepts a list of volumes or disk groups and searches all partitions and volumes in each. It determines which partitions on a disk are part of an AdvFS file domain. You can run the advscan command to rebuild all or part of your /etc/fdmns directory or you can rebuild it by hand by supplying the names of the partitions in a domain.
The following example scans devices rz0 and rz5 for AdvFS partitions:
#
advscan rz0 rz5
Scanning disks rz0 rz5 Found domains:
usr_domain Domain Id 2e09be37.0002eb40 Created Thu Jun 23 09:54:15 1994
Domain volumes 2 /etc/fdmns links 2
Actual partitions found: rz0c rz5c
For the following example, the rz6 domains were removed from /etc/fdmns. The advscan command scans device rz6 and re-creates the missing domains.
#
advscan -r rz6
Scanning disks rz6 Found domains:
*unknown* Domain Id 2f2421ba.0008c1c0 Created Mon Jan 23 13:38:02 1995
Domain volumes 1 /etc/fdmns links 0
Actual partitions found: rz6a* *unknown* Domain Id 2f535f8c.000b6860 Created Tue Feb 28 09:38:20 1995
Domain volumes 1 /etc/fdmns links 0
Actual partitions found: rz6b*
Creating /etc/fdmns/domain_rz6a/ linking rz6a
Creating /etc/fdmns/domain_rz6b/ linking rz6b
See advscan(8) for details on the advscan command.
The showfdmn command displays the attributes of an AdvFS file domain and detailed information about each volume in the file domain. The following example of the showfdmn command displays domain information for the /usr file domain:
%
showfdmn usr
Id Date Created LogPgs Domain Name 2b5361ba.000791be Tue Jan 12 16:26:34 1993 256 usr
Vol 512-Blks Free % Used Cmode Rblks Wblks Vol Name 1L 820164 351580 57% on 256 256 /dev/rz0d
See showfdmn(8) for information about the output of the command.
The showfile command displays the full storage allocation map (extent map) for files in an Advanced File System (AdvFS). An extent is a contiguous area of disk space that the file system allocates to a file. The following example of the showfile command displays the AdvFS-specific attributes for all of the files in the current working directory:
#
showfile *
Id Vol PgSz Pages XtntType Segs SegSz Log Perf File 22a.001 1 16 1 simple ** ** off 50% Mail 7.001 1 16 1 simple ** ** off 20% bin 1d8.001 1 16 1 simple ** ** off 33% c 1bff.001 1 16 1 simple ** ** off 82% dxMail 218.001 1 16 1 simple ** ** off 26% emacs 1ed.001 1 16 0 simple ** ** off 100% foo 1ee.001 1 16 1 simple ** ** off 77% lib 1c8.001 1 16 1 simple ** ** off 94% obj 23f.003 1 16 1 simple ** ** off 100% sb 170a.008 1 16 2 simple ** ** off 35% t 6.001 1 16 12 simple ** ** off 16% tmp
The following example of the showfile command shows the attributes and extent information for the mail file, which is a simple file:
#
showfile -x mail
Id Vol PgSz Pages XtntType Segs SegSz Log Perf File 4198.800d 2 16 27 simple ** ** off 66% tutorial
extentMap: 1 pageOff pageCnt vol volBlock blockCnt 0 5 2 781552 80 5 12 2 785776 192 17 10 2 786800 160 extentCnt: 3
See showfile(8) for information about the output of the command.
The showfsets command displays the filesets (or clone filesets) and their characteristics in a specified domain.
The following is an example of the showfsets command:
#
showfsets dmn
mnt Id : 2c73e2f9.000f143a.1.8001 Clone is : mnt_clone Files : 79, limit = 1000 Blocks (1k) : 331, limit = 25000 Quota Status : user=on group=on
mnt_clone Id : 2c73e2f9.000f143a.2.8001 Clone of : mnt Revision : 1
See showfsets(8) for information about the output of the command.
A number of commands are available to display LSM-related information and to monitor LSM-related activity:
In addition, you can use the Analyze menu in LSM's graphical interface (dxlsm) to monitor activity on volumes, LSM disks, and subdisks.
See the manual Logical Storage Manager for more information about monitoring LSM and for information about LSM performance management.
The voldg list command displays brief information about the attributes of LSM disk groups. If you specify a particular disk group, the command displays more detailed information on the status and configuration of the specified group.
The following example uses the voldg list command to display information about the rootdg disk group:
#
voldg list rootdg
Group: rootdg dgid: 795887625.1025.system32 import-id: 0.1 flags: config: seqno=0.1351 permlen=347 free=316 templen=9 loglen=52 config disk rz9 copy 1 len=347 state=clean online config disk rz10 copy 1 len=347 state=clean online config disk rz12 copy 1 len=347 state=clean online config disk rz15 copy 1 len=347 state=clean online config disk rz11 copy 1 len=347 state=clean online config disk rz13 copy 1 len=347 state=clean online log disk rz8 copy 1 len=200 log disk rz8 copy 2 len=200 log disk rz9 copy 1 len=52 log disk rz10 copy 1 len=52 log disk rz12 copy 1 len=52 log disk rz15 copy 1 len=52 log disk rz11 copy 1 len=52 log disk rz13 copy 1 len=52 log disk rz3 copy 1 len=200 log disk rz3 copy 2 len=200
For more information, see the voldg(8) reference page.
The voldisk list command displays the device names for all recognized disks, the disk names, the disk group names associated with each disk, and the status of each disk.
The following example uses the voldisk list command to display information about the rz15 disk:
#
voldisk list rz15
Device: rz15 devicetag: rz15 type: sliced hostid: system32 disk: name=rz15 id=795887633.1049.system32 group: name=rootdg id=795887625.1025.system32 flags: online ready private imported pubpaths: block=/dev/rz15g char=/dev/rrz15g privpaths: block=/dev/rz15h char=/dev/rrz15h version: 1.1 iosize: 512 public: slice=6 offset=0 len=2697533 private: slice=7 offset=0 len=512 update: time=795888426 seqno=0.18 headers: 0 248 configs: count=1 len=347 logs: count=1 len=52 Defined regions: config priv 17- 247[ 231]: copy=01 offset=000000 config priv 249- 364[ 116]: copy=01 offset=000231 log priv 365- 416[ 52]: copy=01 offset=000000
For more information, see the voldisk(8) reference page.
The volprint command displays information from records in the LSM configuration database. You can select the records to be displayed by name or using special search expressions. In addition, you can display record association hierarchies, so that the structure of records is more apparent.
Use the volprint command to display disk group, disk media, volume, plex, and subdisk records. Use the voldisk list command to display disk access records, or physical disk information.
The following example uses the volprint command to show the status of the voldev1 volume:
#
volprint -ht voldev1
DG NAME GROUP-ID DM NAME DEVICE TYPE PRIVLEN PUBLEN PUBPATH V NAME USETYPE KSTATE STATE LENGTH READPOL PREFPLEX PL NAME VOLUME KSTATE STATE LENGTH LAYOUT ST-WIDTH MODE SD NAME PLEX PLOFFS DISKOFFS LENGTH DISK-NAME DEVICE
v voldev1 fsgen ENABLED ACTIVE 804512 SELECT - pl voldev1-01 voldev1 ENABLED TEMP 804512 CONCAT - WO sd rz8-01 voldev1-01 0 0 804512 rz8 rz8 pl voldev1-02 voldev1 ENABLED ACTIVE 804512 CONCAT - RW sd dev1-01 voldev1-02 0 2295277 402256 dev1 rz9 sd rz15-02 voldev1-02 402256 2295277 402256 rz15 rz15
For more information, see the volprint(8) reference page.
The volstat command provides information about activity on volumes, plexes, subdisks, and disks under LSM control. It reports statistics that reflect the activity levels of LSM objects since boot time.
The amount of information displayed depends on what options you specify to volstat. For example, you can display statistics for a specific LSM object, or you can display statistics for all objects at one time. You can also specify a disk group, in which case, only statistics for objects in that disk group are displayed; if you do not specify a particular disk group, volstat displays statistics for the default disk group (rootdg).
The volstat command can also be used to reset the statistics information to zero. This can be done for all objects or for only specified objects. Resetting just prior to a particular operation makes it possible to measure the subsequent impact of that particular operation.
The following example shows statistics on LSM volumes.
#
volstat
OPERATIONS BLOCKS AVG TIME(ms) TYP NAME READ WRITE READ WRITE READ WRITE vol archive 865 807 5722 3809 32.5 24.0 vol home 2980 5287 6504 10550 37.7 221.1 vol local 49477 49230 507892 204975 28.5 33.5 vol src 79174 23603 425472 139302 22.4 30.9 vol swapvol 22751 32364 182001 258905 25.3 323.2
For more information, see the volstat(8) reference page.
The voltrace command reads an event log (/dev/volevent) and prints formatted event log records to standard output. Using voltrace, you can set event trace masks to determine which type of events will be tracked. For example, you can trace I/O events, configuration changes, or I/O errors.
The following sample voltrace command shows status on all new events.
#
voltrace -n -e all
18446744072623507277 IOTRACE 439: req 3987131 v:rootvol p:rootvol-01 \ d:root_domain s:rz3-02 iot write lb 0 b 63120 len 8192 tm 12 18446744072623507277 IOTRACE 440: req 3987131 \ v:rootvol iot write lb 0 b 63136 len 8192 tm 12
For more information, see the voltrace(8) reference page.
The volwatch command monitors LSM for failure events and sends mail to the specified user.
For more information, see the volwatch(8) reference page.
LSM's graphical interface (dxlsm) includes an Analyze menu. The Analyze menu allows you to display statistics about volumes, LSM disks, and subdisks. The information is displayed graphically, using colors and patterns on the disk icons, and numerically, using the Analysis Statistics form. You can use the Analysis Parameters form to tailor the information that will be displayed.
See the manual Logical Storage Manager for information about dxlsm.
System parameters are global variables. You can monitor the setting of these variables by using the Kernel Tuner or the sysconfig command. Also, as explained in Section 2.2.10, you can also do this monitoring with dbx.
The Kernel Tuner (dxkerneltuner) is provided by the Common Desktop Environment's (CDE) graphical user interface. To access the Kernel Tuner, click on the Application Manager icon in the CDE menu bar and then select the Monitoring/Tuning category. When you then select the Kernel Tuner, a pop-up containing a list of subsystems appears. Selecting a subsystem generates a display of the subsystem's attributes and their values. See Appendix B for descriptions of the attributes displayed by the Kernel Tuner or the sysconfig command.
The sysconfig command is part of a system utility that allows you to modify most of the global variables that affect system performance without needing to rebuild the kernel to put the new values permanently in effect. (Section 3.3 explains how to modify global variables in this way.)
The sysconfig -q command monitors the values of attributes. Each attribute corresponds to a global variable; however, not all global variables have corresponding attributes (that is, only a subset of the global variables in a system have corresponding attributes). Attribute names usually differ slightly from the names that are used for their corresponding global variables in the system configuration file (/usr/sys/conf/system_name) and the param.c file (/usr/sys/system_name/param.c), but they are always very similar.
To examine the current setting of a particular global variable, issue a sysconfig command with the name of the subsystem that owns the variable and the name of the attribute that corresponds to the particular variable:
sysconfig -q subsystem_name [ attribute_name ]
If you omit attribute_name, the values for all of the attributes for the named subsystem are displayed.
Use the following command to list the subsystem names that you can specify in a sysconfig command:
sysconfig -s
For example:
#
sysconfig -s
Cm: loaded and configured Generic: loaded and configured Proc: loaded and configured#
.
.
.
Xpr: loaded and configured Rt: loaded and configured Net: loaded and configured
Use the following command to list the values of all of the
attributes associated with a particular subsystem:
sysconfig -q subsystem_name
#
sysconfig -q vm
ubc-minpercent = 10 ubc-maxpercent = 100# sysconfig -q vfs
.
.
.
vm-syswiredpercent = 80 vm-inswappedmin = 1
name-cache-size = 1029 name-cache-hash-size = 256
.
.
.
max-ufs-mounts = 1000 vnode-deallocation-enable = 1 #
Note that a global variable's value in the system configuration file
or the
param.c
file
can differ from the value assigned to the global variable's attribute
established in the
sysconfigtab
file
(/etc/sysconfigtab)
by the
sysconfigdb
command or in a running kernel by the
sysconfig
-r
command.
In a running system, values established by the
sysconfig
-r
command override values established in the
sysconfigtab
file, and values in the
sysconfigtab
file override values in the system configuration file or the
param.c
file.
If an attribute is not defined in the sysconfigtab file, the sysconfig -q command returns the value of the corresponding parameter in the system configuration file or param.c.
To display the minimum and maximum values that can be given to attributes, issue the following command:
sysconfig -Q subsystem_name [ attribute_list ]
See sysconfig(8) or the Kernel Debugging and Configuration Management Guide for details on the sysconfig command. See Section 3.3 for information on how to tune the values of configuration attributes using the sysconfigdb and sysconfig -r commands.
For descriptions of the configuration attributes that have an effect on system performance, see Appendix B. Note that not all subsystems displayed by a sysconfig -r command are covered in Appendix B. Only those subystems that have tunable attributes affecting performance are covered.
You can use dbx to examine source files, control program execution, display the state of the program, and debug at the machine-code level. To examine the values of variables and data structures, use the dbx print command.
To examine a running system with dbx, issue the following command:
#
dbx -k /vmunix /dev/mem
The following sections describe how to use
dbx
to examine various subsystems of the Digital UNIX operating system.
You can check virtual memory by using dbx and examining the vm_perfsum structure. Note the vpf_pagefaults field (number of hardware page faults) and the vpf_swapspace field (number of pages of swap space not reserved):
(dbx)
p vm_perfsum
struct { vpf_pagefaults = 6732100
.
.
.
vpf_swapspace = 29230 } (dbx)
See Section 3.4 for information on how to tune the virtual memory subsystem.
To check UFS using dbx, examine the ufs_clusterstats structure to see how efficiently the system is performing cluster read and write transfers. You can examine the cluster reads and writes separately with the ufs_clusterstats_read and ufs_clusterstats_write structures.
The following example shows a system that is not clustering efficiently:
(dbx)
p ufs_clusterstats
struct { full_cluster_transfers = 3130 part_cluster_transfers = 9786 non_cluster_transfers = 16833 sum_cluster_transfers = { [0] 0 [1] 24644 [2] 1128 [3] 463 [4] 202 [5] 55 [6] 117 [7] 36 [8] 123 [9] 0 } } (dbx)
The preceding example shows 24644 single-block
transfers and no 9-block transfers.
The trend of the data shown in the example is the reverse of
what you want to see. It shows a large number of single-block
transfers and a declining number of multiblock (1 - 9)
transfers. However, if the files are all small, this may be
the best blocking that you can achieve.
See Section 3.6.1.2 for information on how to tune the UFS file system.
The UFS namei cache stores recently used file system pathname/inode number pairs. It also stores inode information for files that were referenced but not found. Having this information in the cache substantially reduces the amount of searching that is needed to perform pathname translations.
To check the namei cache, use dbx and look at the nchstats data structure. In particular, look at the ncs_goodhits, ncs_neghits, and ncs_misses fields to determine the hit rate. The hit rate should be above 80 percent ( ncs_goodhits plus ncs_neghits divided by the sum of the ncs_goodhits, ncs_neghits, and ncs_misses).
For example:
(dbx)
p nchstats
struct { ncs_goodhits = 9748603 -found a pair ncs_neghits = 888729 -found a pair that didn't exist ncs_badhits = 23470 ncs_falsehits = 69371 ncs_miss = 1055430 -did not find a pair ncs_long = 4067 -name was too long to fit in the cache ncs_pass2 = 127950 ncs_2passes = 195763 ncs_dirscan = 47 } (dbx)
For information on how to improve the namei cache hit rate, see Section 3.6.1. For information on how to improve namei cache lookup speeds, see Section 3.4.1.3.
To check the UBC, use dbx to examine the vm_perfsum structure. In particular, look at the vpf_piowrites field (number of I/O operations for page outs generated by the page stealing daemon) and the vpf_ubcalloc field (number of times the UBC had to allocate a page from the virtual memory free page list to satisfy memory demands). For example:
(dbx)
p vm_perfsum
struct { vpf_pagefaults = 6732100 vpf_kpagefaults = 119865 vpf_cowfaults = 926159 vpf_cowsteals = 192703 vpf_zfod = 2720195 vpf_kzfod = 119865 vpf_pgiowrites = 1882 vpf_pgwrites = 4747 vpf_pgioreads = 1874108 vpf_pgreads = 1412 vpf_swapreclaims = 4 vpf_taskswapouts = 0 vpf_taskswapins = 0 vpf_vplmsteal = 1411 vpf_vplmstealwins = 1365 vpf_vpseqdrain = 0 vpf_ubchit = 3851 vpf_ubcalloc = 103378 vpf_ubcpushes = 0 vpf_ubcpagepushes = 0 vpf_ubcdirtywra = 0 vpf_ubcreclaim = 0 vpf_reactivate = 1973 vpf_allocatedpages = 16177 vpf_wiredpages = 2805 vpf_ubcpages = 5494 vpf_freepages = 3384 vpf_swapspace = 29230 } (dbx)
The vpf_ubcpages field gives the number of pages of physical memory that the UBC is using to cache file data. If the UBC is using significantly more than half of physical memory and the paging rate (vpf_pgiowrites field) is high, you should probably reduce ubc-maxpercent to 50 percent. This should cause a decrease in the paging activity.
You can also monitor the UBC by examining the ufs_getapage_stats kernel data structure. You can calculate the hit rate by dividing the value for read_hits by the value for read_looks. A good hit rate is a rate above 95 percent.
(dbx)
p ufs_getapage_stats
struct { read_looks = 2059022 read_hits = 2022488 read_miss = 36506 } (dbx)
In addition, you can check the UBC by examining the vm_tune structure and the vt_ubcseqpercent and vt_ubcseqstartpercent fields. These values are used to prevent a large file from completely filling the UBC, thus limiting the amount of memory available to the virtual memory subsystem.
(dbx)
p vm_tune
struct { vt_cowfaults = 4 vt_mapentries = 200 vt_maxvas = 1073741824 vt_maxwire = 16777216 vt_heappercent = 7 vt_anonklshift = 17 vt_anonklpages = 1 vt_vpagemax = 16384 vt_segmentation = 1 vt_ubcpagesteal = 24 vt_ubcdirtypercent = 10 vt_ubcseqstartpercent = 50 vt_ubcseqpercent = 10 vt_csubmapsize = 1048576 vt_ubcbuffers = 256 vt_syncswapbuffers = 128 vt_asyncswapbuffers = 4 vt_clustermap = 1048576 vt_clustersize = 65536 vt_zone_size = 0 vt_kentry_zone_size = 16777216 vt_syswiredpercent = 80 vt_inswappedmin = 1 }
When copying large files, the source and destination objects in the UBC will grow very large (up to all of available physical memory). Reducing the value of vt_ubcseqpercent decreases the number of pages that will be used to cache sequentially accessed files (that is, files being moved in memory). The value represents the percent of memory that a sequentially accessed file can grow to before it starts stealing memory from itself. The value imposes a resident set size limit on a file.
See Section 3.4.1 for information on how to tune the UBC.
The metadata buffer cache contains file metadata - superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries. To check the metadata buffer cache, use dbx to examine the bio_stats structure:
(dbx)
p bio_stats
struct { getblk_hits = 4590388 getblk_misses = 17569 getblk_research = 0 getblk_dupbuf = 0 getnewbuf_calls = 17590 getnewbuf_buflocked = 0 vflushbuf_lockskips = 0 mntflushbuf_misses = 0 mntinvalbuf_misses = 0 vinvalbuf_misses = 0 allocbuf_buflocked = 0 ufssync_misses = 0 } (dbx)
If the miss rate is high, you may want to raise the value of the bufcache attribute. The number of block misses (getblk_misses) divided by the sum of block misses and block hits (getblk_hits) should not be more than 3 percent.
See Section 3.4.1.3 for information on how to tune the metadata buffer cache.
The operating system uses the Common Access Method (CAM) as the operating system interface to the hardware. CAM maintains the xpt_qhead, ccmn_bp_head, and xpt_cb_queue data structures:
Use dbx to examine the three structures:
(dbx)
p xpt_qhead
struct { xws = struct { x_flink = 0xffffffff81f07400 x_blink = 0xffffffff81f03000 xpt_flags = 2147483656 xpt_ccb = (nil) xpt_nfree = 300 xpt_nbusy = 0 } xpt_wait_cnt = 0 xpt_times_wait = 2 xpt_ccb_limit = 1048576 xpt_ccbs_total = 300 x_lk_qhead = struct { sl_data = 0 sl_info = 0 sl_cpuid = 0 sl_lifms = 0 } }(dbx) p ccmn_bp_head
struct { num_bp = 50 bp_list = 0xffffffff81f1be00 bp_wait_cnt = 0 }(dbx) p xpt_cb_queue
struct { flink = 0xfffffc00004d6828 blink = 0xfffffc00004d6828 flags = 0 initialized = 1 count = 0 cplt_lock = struct { sl_data = 0 sl_info = 0 sl_cpuid = 0 sl_lifms = 0 } } (dbx)
If the values for xpt_wait_cnt or bp_wait_cnt are nonzero, CAM has run out of buffer pool space. If this situation persists, you may be able to eliminate the problem by changing one or more of CAM's I/O attributes (see Section B.8).
The count parameter in xpt_cb_queue is the number of I/O operations that have been completed and are ready to be passed back to a peripheral device driver. Normally, the value of count should be zero or one. If greater than one, it could indicate either a problem or a temporary situation in which a large number of I/O operations are completing simultaneously. If repeated testing demonstrates that the value is consistently greater than one, one or more subsystem components may require tuning.
To check network statistics, use the netstat command (or nfsstat command, see Section 2.2.12). Some problems to look for are as follows:
Most of the information provided by netstat is used to diagnose network hardware or software failures, not to analyze tuning opportunities. See the manual Network Administration for additional information on how to diagnose failures.
The following example shows the output produced by the -i option of the netstat command:
#
netstat -i
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll ln0 1500 DLI none 133194 2 23632 4 4881 ln0 1500 <Link> 133194 2 23632 4 4881 ln0 1500 red-net node1 133194 2 23632 4 4881 sl0* 296 <Link> 0 0 0 0 0 sl1* 296 <Link> 0 0 0 0 0 lo0 1536 <Link> 580 0 580 0 0 lo0 1536 loop localhost 580 0 580 0 0
Use the following command to determine the causes of the input (Ierrs) and output (Oerrs) shown in the preceding example:
#
netstat -is
ln0 Ethernet counters at Fri Jan 14 16:57:36 1994
4112 seconds since last zeroed 30307093 bytes received 3722308 bytes sent 133245 data blocks received 23643 data blocks sent 14956647 multicast bytes received 102675 multicast blocks received 18066 multicast bytes sent 309 multicast blocks sent 3446 blocks sent, initially deferred 1130 blocks sent, single collision 1876 blocks sent, multiple collisions 4 send failures, reasons include: Excessive collisions 0 collision detect check failure 2 receive failures, reasons include: Block check error Framing Error 0 unrecognized frame destination 0 data overruns 0 system buffer unavailable 0 user buffer unavailable
The
-s
option for the
netstat
command displays statistics for each protocol:
#
netstat -s
ip: 67673 total packets received 0 bad header checksums 0 with size smaller than minimum 0 with data size < data length 0 with header length < data size 0 with data length < header length 8616 fragments received 0 fragments dropped (dup or out of space) 5 fragments dropped after timeout 0 packets forwarded 8 packets not forwardable 0 redirects sent icmp: 27 calls to icmp_error 0 errors not generated 'cuz old message was icmp Output histogram: echo reply: 8 destination unreachable: 27 0 messages with bad code fields 0 messages < minimum length 0 bad checksums 0 messages with bad length Input histogram: echo reply: 1 destination unreachable: 4 echo: 8 8 message responses generated igmp: 365 messages received 0 messages received with too few bytes 0 messages received with bad checksum 365 membership queries received 0 membership queries received with invalid field(s) 0 membership reports received 0 membership reports received with invalid field(s) 0 membership reports received for groups to which we belong 0 membership reports sent tcp: 11219 packets sent 7265 data packets (139886 bytes) 4 data packets (15 bytes) retransmitted 3353 ack-only packets (2842 delayed) 0 URG only packets 14 window probe packets 526 window update packets 57 control packets 12158 packets received 7206 acks (for 139930 bytes) 32 duplicate acks 0 acks for unsent data 8815 packets (1612505 bytes) received in-sequence 432 completely duplicate packets (435 bytes) 0 packets with some dup. data (0 bytes duped) 14 out-of-order packets (0 bytes) 1 packet (0 bytes) of data after window 0 window probes 1 window update packet 5 packets received after close 0 discarded for bad checksums 0 discarded for bad header offset fields 0 discarded because packet too short 19 connection requests 25 connection accepts 44 connections established (including accepts) 47 connections closed (including 0 drops) 3 embryonic connections dropped 7217 segments updated rtt (of 7222 attempts) 4 retransmit timeouts 0 connections dropped by rexmit timeout 0 persist timeouts 0 keepalive timeouts 0 keepalive probes sent 0 connections dropped by keepalive udp: 12003 packets sent 48193 packets received 0 incomplete headers 0 bad data length fields 0 bad checksums 0 full sockets 12943 for no port (12916 broadcasts, 0 multicasts)
See netstat(1) for information about the output produced by the various options supported by the netstat command.
To check NFS statistics, use the nfsstat command. For example:
#
nfsstat
#
Server rpc: calls badcalls nullrecv badlen xdrcall 38903 0 0 0 0
Server nfs: calls badcalls 38903 0
Server nfs V2: null getattr setattr root lookup readlink read 5 0% 3345 8% 61 0% 0 0% 5902 15% 250 0% 1497 3% wrcache write create remove rename link symlink 0 0% 1400 3% 549 1% 1049 2% 352 0% 250 0% 250 0% mkdir rmdir readdir statfs 171 0% 172 0% 689 1% 1751 4%
Server nfs V3: null getattr setattr lookup access readlink read 0 0% 1333 3% 1019 2% 5196 13% 238 0% 400 1% 2816 7% write create mkdir symlink mknod remove rmdir 2560 6% 752 1% 140 0% 400 1% 0 0% 1352 3% 140 0% rename link readdir readdir+ fsstat fsinfo pathconf 200 0% 200 0% 936 2% 0 0% 3504 9% 3 0% 0 0% commit 21 0%
Client rpc: calls badcalls retrans badxid timeout wait newcred 27989 1 0 0 1 0 0 badverfs timers 0 4
Client nfs: calls badcalls nclget nclsleep 27988 0 27988 0
Client nfs V2: null getattr setattr root lookup readlink read 0 0% 3414 12% 61 0% 0 0% 5973 21% 257 0% 1503 5% wrcache write create remove rename link symlink 0 0% 1400 5% 549 1% 1049 3% 352 1% 250 0% 250 0% mkdir rmdir readdir statfs 171 0% 171 0% 713 2% 1756 6%
Client nfs V3: null getattr setattr lookup access readlink read 0 0% 666 2% 9 0% 2598 9% 137 0% 200 0% 1408 5% write create mkdir symlink mknod remove rmdir 1280 4% 376 1% 70 0% 200 0% 0 0% 676 2% 70 0% rename link readdir readdir+ fsstat fsinfo pathconf 100 0% 100 0% 468 1% 0 0% 1750 6% 1 0% 0 0% commit 10 0%
The ratio of timeouts to calls (which should not exceed 1 percent) is the most important thing to look for in the NFS statistics. A timeout-to-call ratio greater than 1 percent can have a significant negative impact on performance. See Section 3.6.3 for information on how to tune your system to avoid timeouts.
If you are attempting to monitor an experimental situation with nfsstat, it may be advisable to reset the NFS counters to zero before you begin the experiment. The nfsstat -z command can be used to clear the counters.