This manual describes how to use Digital UNIX tools to debug kernel programs and the kernel. It also includes information about managing and analyzing crash dump files.
In addition to the information provided here, tracing a kernel problem can require a basic understanding of one or more of the following technical areas:
See the Alpha Architecture Handbook for an overview of the Alpha hardware architecture and a description of the 64-bit Alpha RISC instruction set.
See the Alpha Architecture Reference Manual for information on how the Digital UNIX operating system interfaces with the hardware.
This chapter provides an overview of the following topics:
You cannot directly debug a bootstrap-linked kernel because you must supply the name of an image to the kernel debugging tools. Without the image, the tools have no access to symbol names, variable names, and so on. Therefore, the first step in any kernel debugging effort is to determine whether your kernel was linked at bootstrap time. If the kernel was linked at bootstrap time, you must then build a kernel image file to use for debugging purposes.
The best way to determine whether your system is bootstrap linked or statically linked is to use the file command to test the type of file from which your system was booted. If your system is a bootstrap-linked system, it was booted from an ASCII text file; otherwise, it was booted from an executable image file. For example, issue the following command to determine the type of file from which your system was booted:
#/usr/bin/file `/usr/sbin/sizer -b` /etc/sysconfigtab: ascii textThe sizer -b command returns the name of the file from which the system was booted. This file name is input to the file command, which determines that the system was booted from an ASCII text file. The output shown in the preceeding example indicates that the system is a bootstrap-linked system. If the system had been booted from an executable image file named vmunix, the output from the file command would have appeared as follows:
vmunix:COFF format alpha executable or object module not strippedIf your system is running a bootstrap-linked kernel, build a kernel image that is identical to the bootstrap-linked kernel your system is running, by entering the following command:
# /usr/bin/ld -o vmunix.image `/usr/sbin/sizer -m`The output from the sizer -m command is a list of the exact modules and linker flags used to build the currently running bootstrap-linked kernel. This output causes the ld command to create a kernel image that is identical to the bootstrap-linked kernel running on your system. The kernel image is written to the file named by the -o flag, in this case the vmunix.image file.
Once you create this image, you can debug the kernel as described in this manual, using the dbx, kdbx, and kdebug debuggers. When you invoke the dbx or kdbx debugger, remember to specify the name of the kernel image file you created with the ld command, such as the vmunix.image file shown here.
When you are finished debugging the kernel, you can remove the kernel image file you created for debugging purposes.
The Digital UNIX system also provides the kdbx debugger, which is designed especially for debugging kernel code. This debugger contains a number of special commands, called extensions, that allow you to display kernel data structures in a readable format. Section 2.2 describes using kdbx and its extensions. (You cannot use the kdbx debugger with the kdebug debugger.)
Another feature of kdbx is that you can customize it by writing your own extensions. The system contains a set of kdbx library routines that you can use to create extensions that display kernel data structures in ways that are meaningful to you. Chapter 3 describes writing kdbx extensions.
You use the dbx or kdbx debugger to examine the state of processes running on your system and to examine the value of system parameters. The kdbx debugger provides special commands, called extensions, that you can use to display kernel data structures. (Section 2.2.3 describes the extensions.)
To examine the state of processes, you invoke the debugger (as described in Section 2.1 or Section 2.2) using the following command:
# dbx -k /vmunix /dev/memThis command invokes dbx with the kernel debugging flag, -k, which maps kernel addresses to make kernel debugging easier. The /vmunix and /dev/mem parameters cause the debugger to operate on the running kernel.
Once in the dbx environment, you use dbx commands to display process IDs and trace execution of processes. You can perform the same tasks using the kdbx debugger. The following example shows the dbx command you use to display process IDs:
(dbx) kps PID COMM 00000 kernel idle 00001 init 00014 kloadsrv 00016 updateIf you want to trace the execution of the kloadsrv daemon, use the dbx command to set the $pid symbol to the process ID of the kloadsrv daemon. Then, enter the t command:
.
.
.
(dbx) set $pid = 14 (dbx) t > 0 thread_block() ["/usr/sde/build/src/kernel/kern/sched_prim.c":1623, 0xfffffc0000\ 43d77c] 1 mpsleep(0xffffffff92586f00, 0x11a, 0xfffffc0000279cf4, 0x0, 0x0) ["/usr/sde/build\ /src/kernel/bsd/kern_synch.c":411, 0xfffffc000040adc0] 2 sosleep(0xffffffff92586f00, 0x1, 0xfffffc000000011a, 0x0, 0xffffffff81274210) ["/usr/sde\ /build/src/kernel/bsd/uipc_socket2.c":654, 0xfffffc0000254ff8] 3 sosbwait(0xffffffff92586f60, 0xffffffff92586f00, 0x0, 0xffffffff92586f00, 0x10180) ["/usr\ /sde/build/src/kernel/bsd/uipc_socket2.c":630, 0xfffffc0000254f64] 4 soreceive(0x0, 0xffffffff9a64f658, 0xffffffff9a64f680, 0x8000004300000000, 0x0) ["/usr/sde\ /build/src/kernel/bsd/uipc_socket.c":1297, 0xfffffc0000253338] 5 recvit(0xfffffc0000456fe8, 0xffffffff9a64f718, 0x14000c6d8, 0xffffffff9a64f8b8,\ 0xfffffc000043d724) ["/usr/sde/build/src/kernel/bsd/uipc_syscalls.c":1002,\ 0xfffffc00002574f0] 6 recvfrom(0xffffffff81274210, 0xffffffff9a64f8c8, 0xffffffff9a64f8b8, 0xffffffff9a64f8c8,\ 0xfffffc0000457570) ["/usr/sde/build/src/kernel/bsd/uipc_syscalls.c":860,\ 0xfffffc000025712c] 7 orecvfrom(0xffffffff9a64f8b8, 0xffffffff9a64f8c8, 0xfffffc0000457570, 0x1, 0xfffffc0000456fe8)\ ["/usr/sde/build/src/kernel/bsd/uipc_syscalls.c":825, 0xfffffc000025708c] 8 syscall(0x120024078, 0xffffffffffffffff, 0xffffffffffffffff, 0x21, 0x7d) ["/usr/sde\ /build/src/kernel/arch/alpha/syscall_trap.c":515, 0xfffffc0000456fe4 9 _Xsyscall(0x8, 0x12001acb8, 0x14000eed0, 0x4, 0x1400109d0) ["/usr/sde/build\ /src/kernel/arch/alpha/locore.s":1046, 0xfffffc00004486e4] (dbx) exit
Often, looking at the trace of a process that is hanging or has unexpectedly stopped running reveals the problem. Once you find the problem, you can modify system parameters, restart daemons, or take other corrective actions.
For more information about the commands you can use to debug the running kernel, see Section 2.1 and Section 2.2.
The operating system can crash because one of the following occurs:
When a system hangs, it is often necessary to force the system to create dumps that you can analyze to determine why the system hung. Section 4.7 describes the procedure for forcing a crash dump of a hung system.
The system crashes or hangs because it cannot continue executing. Normally, even in the case of a hardware exception, the operating system detects the problem. (For example a machine-checking routine might discover a hardware problem and begin the process of crashing the system.) In general, the operating system performs the following steps when it detects a problem from which it cannot recover:
The panic function saves the contents of registers and sends the panic string (a message describing the reason for the system panic) to the error logger and the console terminal.
If the system is a Symmetric Multiprocessing (SMP) system, the panic function notifies the other CPUs in the system that a panic has occurred. The other CPUs then also execute the panic function and record the following panic string:
cpu_ip_intr: panic requestOnce each CPU has recorded the system panic, execution continues only on the master CPU. All other CPUs in the SMP system stop execution.
The boot function records the stack.
The dump function copies core memory into swap partitions and the system stops running or the reboot process begins. Console environment variables control whether the system reboots automatically. (The Installation Guide describes these environment variables.)
At system reboot time, the copy of core memory saved in the swap partitions is copied into a file, called a crash dump file. You can analyze the crash dump file to determine what caused the crash. For information about managing crash dumps and crash dump files, see Chapter 4. For examples of analyzing crash dump files, see Chapter 5.