This manual describes how to use Digital UNIX tools to debug kernel programs and the kernel. It also includes information about managing and analyzing crash dump files.
In addition to the information provided here, tracing a kernel problem can require a basic understanding of one or more of the following technical areas:
See the Alpha Architecture Reference Manual for information on how the Digital UNIX operating system interfaces with the hardware.
This chapter provides an overview of the following topics:
You cannot directly debug a bootstrap-linked kernel because you must
supply the name of an image to the kernel debugging tools. Without the image,
the tools have no access to symbol names, variable names, and so on. Therefore,
the first step in any kernel debugging effort is to determine whether your
kernel was linked at bootstrap time. If the kernel was linked at bootstrap
time, you must then build a kernel image file to use for debugging purposes.
The best way to determine whether your system is bootstrap linked or
statically linked is to use the file command to test the type of
file from which your system was booted. If your system is a bootstrap-linked
system, it was booted from an ASCII text file; otherwise, it was booted from
an executable image file. For example, issue the following command to determine
the type of file from which your system was booted:
Once you create this image, you can debug the kernel as described in
this manual, using the dbx, kdbx, and kdebug
debuggers. When you invoke the dbx or kdbx debugger,
remember to specify the name of the kernel image file you created with the ld command, such as the vmunix.image file shown here.
When you are finished debugging the kernel, you can remove the kernel
image file you created for debugging purposes.
The Digital UNIX system also provides the kdbx debugger,
which is designed especially for debugging kernel code. This debugger contains
a number of special commands, called extensions, that allow you to display
kernel data structures in a readable format. Section 2.2
describes using kdbx and its extensions. (You cannot use the kdbx debugger with the kdebug debugger.)
Another feature of kdbx is that you can customize it by writing
your own extensions. The system contains a set of kdbx library
routines that you can use to create extensions that display kernel data structures
in ways that are meaningful to you. Chapter 3
describes writing kdbx extensions.
You use the dbx or kdbx debugger to examine the
state of processes running on your system and to examine the value of system
parameters. The kdbx debugger provides special commands, called
extensions, that you can use to display kernel data structures. (Section 2.2.3
describes the extensions.)
To examine the state of processes, you invoke the debugger (as described
in Section 2.1 or Section 2.2) using the
following command:
Once in the dbx environment, you use dbx commands
to display process IDs and trace execution of processes. You can perform the
same tasks using the kdbx debugger. The following example shows
the dbx command you use to display process IDs:
Often, looking at the trace of a process that is hanging or has unexpectedly
stopped running reveals the problem. Once you find the problem, you can modify
system parameters, restart daemons, or take other corrective actions.
For more information about the commands you can use to debug the running
kernel, see Section 2.1 and Section 2.2.
The operating system can crash because one of the following occurs:
When a system hangs, it is often necessary to force the system to create
dumps that you can analyze to determine why the system hung. Section 4.7
describes the procedure for forcing a crash dump of a hung system.
The system crashes or hangs because it cannot continue executing. Normally,
even in the case of a hardware exception, the operating system detects the
problem. (For example a machine-checking routine might discover a hardware
problem and begin the process of crashing the system.) In general, the operating
system performs the following steps when it detects a problem from which it
cannot recover:
The panic function saves the contents of registers and sends
the panic string (a message describing the reason for the system panic) to
the error logger and the console terminal.
If the system is a Symmetric Multiprocessing (SMP) system, the panic function notifies the other CPUs in the system that a panic has occurred. The other CPUs then also execute the panic
function and record the following panic string:
The boot function records the stack.
The dump function copies core memory into swap partitions
and the system stops running or the reboot process begins. Console environment
variables control whether the system reboots automatically. (The Installation Guide
describes these environment variables.)
At system reboot time, the copy of core memory saved in the swap partitions
is copied into a file, called a crash dump file. You can analyze the crash
dump file to determine what caused the crash. For information about managing
crash dumps and crash dump files, see Chapter 4. For
examples of analyzing crash dump files, see Chapter 5.
1.1 Linking a Kernel Image for Debugging
By default, the kernel that runs on Digital UNIX systems is a statically
linked image that resides in the file /vmunix. However, your system
might be configured so that it is linked at bootstrap time. Rather than being
a bootable image, the boot file is a text file that describes the hardware
and software that will be present on the running system. Using this information,
the bootstrap linker links the modules that are needed to support this hardware
and software. The linker builds the kernel directly into memory. (For more
information about bootstrap-linked kernels, see the manual Writing
Device Drivers: Tutorial.)
#/usr/bin/file `/usr/sbin/sizer -b`
/etc/sysconfigtab: ascii text
The sizer -b command returns the name of the file from which
the system was booted. This file name is input to the file command,
which determines that the system was booted from an ASCII text file. The output
shown in the preceeding example indicates that the system is a bootstrap-linked
system. If the system had been booted from an executable image file named vmunix, the output from the file command would have appeared
as follows:
vmunix:COFF format alpha executable or object module
not stripped
If your system is running a bootstrap-linked kernel,
build a kernel image that is identical to the bootstrap-linked kernel your
system is running, by entering the following command:
# /usr/bin/ld -o vmunix.image `/usr/sbin/sizer -m`
The
output from the sizer -m command is a list of the exact modules
and linker flags used to build the currently running bootstrap-linked kernel.
This output causes the ld command to create a kernel image that
is identical to the bootstrap-linked kernel running on your system. The kernel
image is written to the file named by the -o flag, in this case
the vmunix.image file.1.2 Debugging Kernel Programs
Kernel programs can be difficult
to debug because you normally cannot control kernel execution. To make debugging
kernel programs more convenient, the Digital UNIX system provides the kdebug debugger. The kdebug debugger is code that resides
inside the kernel and allows you use the dbx debugger to control
execution of a running kernel in the same manner as you control execution
of a user space program. To debug a kernel program in this manner, follow
these steps:
1.3 Debugging the Running Kernel
When you have problems with a process or set of processes, you can attempt
to identify the problem by debugging the running kernel. You might also
invoke the debugger on the running kernel to examine the values assigned to
system parameters. (You can modify the value of the parameters using the debugger,
but this practice can cause problems with the kernel and should be avoided.)
# dbx -k /vmunix /dev/mem
This command invokes dbx with the kernel debugging
flag, -k, which maps kernel addresses to make kernel debugging
easier. The /vmunix and /dev/mem parameters cause the
debugger to operate on the running kernel.
(dbx) kps
PID COMM
00000 kernel idle
00001 init
00014 kloadsrv
00016 update
If you want to trace the execution of the kloadsrv daemon,
use the dbx command to set the $pid symbol to the process
ID of the kloadsrv daemon. Then, enter the t command:
.
.
.
(dbx) set $pid = 14
(dbx) t
> 0 thread_block() ["/usr/sde/build/src/kernel/kern/sched_prim.c":1623, 0xfffffc0000\
43d77c]
1 mpsleep(0xffffffff92586f00, 0x11a, 0xfffffc0000279cf4, 0x0, 0x0) ["/usr/sde/build\
/src/kernel/bsd/kern_synch.c":411, 0xfffffc000040adc0]
2 sosleep(0xffffffff92586f00, 0x1, 0xfffffc000000011a, 0x0, 0xffffffff81274210) ["/usr/sde\
/build/src/kernel/bsd/uipc_socket2.c":654, 0xfffffc0000254ff8]
3 sosbwait(0xffffffff92586f60, 0xffffffff92586f00, 0x0, 0xffffffff92586f00, 0x10180) ["/usr\
/sde/build/src/kernel/bsd/uipc_socket2.c":630, 0xfffffc0000254f64]
4 soreceive(0x0, 0xffffffff9a64f658, 0xffffffff9a64f680, 0x8000004300000000, 0x0) ["/usr/sde\
/build/src/kernel/bsd/uipc_socket.c":1297, 0xfffffc0000253338]
5 recvit(0xfffffc0000456fe8, 0xffffffff9a64f718, 0x14000c6d8, 0xffffffff9a64f8b8,\
0xfffffc000043d724) ["/usr/sde/build/src/kernel/bsd/uipc_syscalls.c":1002,\
0xfffffc00002574f0]
6 recvfrom(0xffffffff81274210, 0xffffffff9a64f8c8, 0xffffffff9a64f8b8, 0xffffffff9a64f8c8,\
0xfffffc0000457570) ["/usr/sde/build/src/kernel/bsd/uipc_syscalls.c":860,\
0xfffffc000025712c]
7 orecvfrom(0xffffffff9a64f8b8, 0xffffffff9a64f8c8, 0xfffffc0000457570, 0x1, 0xfffffc0000456fe8)\
["/usr/sde/build/src/kernel/bsd/uipc_syscalls.c":825, 0xfffffc000025708c]
8 syscall(0x120024078, 0xffffffffffffffff, 0xffffffffffffffff, 0x21, 0x7d) ["/usr/sde\
/build/src/kernel/arch/alpha/syscall_trap.c":515, 0xfffffc0000456fe4
9 _Xsyscall(0x8, 0x12001acb8, 0x14000eed0, 0x4, 0x1400109d0) ["/usr/sde/build\
/src/kernel/arch/alpha/locore.s":1046, 0xfffffc00004486e4]
(dbx) exit
1.4 Analyzing a Crash Dump File
If your system crashes, you can often find the cause of the crash by
using dbx or kdbx to debug or analyze a crash dump file.
cpu_ip_intr: panic request
Once each CPU has recorded the system panic, execution continues
only on the master CPU. All other CPUs in the SMP system stop execution.