Finding problems in crash dump files is a task that takes practice and experience to do well. Exactly how you determine what caused a crash varies depending on how the system crashed. The cause of some crashes is relatively easy to determine, while finding the cause of other crashes is difficult and time-consuming.
This chapter helps you analyze crash dump files by providing the following information:
Guidelines for examining crash dump files (Section 4.1)
Examples of identifying the cause of a software panic (Section 4.2)
Examples of identifying the cause of a hardware trap (Section 4.3)
An example of finding a panic string that is not in the current thread (Section 4.4)
An example of identifying the cause of a crash on an SMP system (Section 4.5)
For information about how crash dump files are created, see
the
System Administration
manual.
4.1 Guidelines for Examining Crash Dump Files
In examining crash dump files, there is no one way to determine the cause of a system crash. However, following these steps should help you identify the events that lead to most crashes:
Gather some facts about the system; for example, operating system type, version number, revision level, hardware configuration.
Locate the thread executing at the time of the crash. Most likely, this thread contains the events that lead to the panic.
Look at the panic string, if one exists.
This
string is contained in the preserved message buffer (pmsgbuf
)
and in the
panicstr
global variable.
The panic string
gives a reason for the crash.
Identify the function that called the
panic
or
trap
function.
That function is the one that caused
the system to crash.
Examine the source code for the function that caused the crash to infer the error that caused the crash. You might also need to examine related data structures and functions that appear earlier in the stack. An earlier function might have passed corrupt data to the function that caused a crash.
Determine whether you can fix the problem.
If the system crashed because of a hardware problem (for example, because a memory board became corrupt), correcting the problem probably requires repairing or replacing the hardware. You might be able to disconnect the hardware that caused the problem and operate without it until it is repaired or replaced. If you need to repair or replace hardware, call your support representative.
If a software panic caused the crash, you can fix the problem if it is in software you or someone else at your company wrote. Otherwise, you must request that the producer of the software fix the problem by calling your support representative.
4.2 Identifying a Crash Caused by a Software Problem
When
software encounters a state from which it cannot continue, it calls the system
panic
function.
For example, if the software attempts to access
an area of memory that is protected from access, the software might call the
panic
function and crash the system.
In most cases, only system programmers can fix the problem that caused
a panic because most panics are caused by software errors.
However, some
system panics reflect other problems.
For example, if a memory board becomes
corrupted, software that attempts to write to that board might call the
panic
function and crash the system.
In this case, the solution
might be to replace the memory board and reboot the system.
The sections that follow demonstrate finding the cause of a software
panic using the
dbx
and
kdbx
debuggers.
You can also examine output from the
crashdc
crash data
collection tool to help you determine the cause of a crash.
Sample output
from
crashdc
is shown and explained in
Appendix A.
4.2.1 Using dbx to Determine the Cause of a Software Panic
The following example shows a method for identifying
a software panic with the
dbx
debugger:
#
dbx -k vmunix.0 vmzcore.0
dbx version 5.0 Type 'help' for help. stopped at [boot:753 ,0xfffffc00003c4ae4] Source not available(dbx)
p panicstr
[1] 0xfffffc000044b648 = "ialloc: dup alloc"(dbx)
t
[2] > 0 boot(paniced = 0, arghowto = 0) ["../../../../src/kernel/arch/alpha/machdep.\ c":753, 0xfffffc00003c4ae4] 1 panic(s = 0xfffffc000044b618 = "mode = 0%o, inum = %d, pref = %d fs = %s\n")\ ["../../../../src/kernel/bsd/subr_prf.c":1119, 0xfffffc00002bdbb0] 2 ialloc(pip = 0xffffffff8c6acc40, ipref = 57664, mode = 0, ipp = 0xffffffff8c\ f95af8) ["../../../../src/kernel/ufs/ufs_alloc.c":501, 0xfffffc00002dab48] 3 maknode(vap = 0xffffffff8cf95c50, ndp = 0xffffffff8cf922f8, ipp = 0xffffffff\ 8cf95b60) ["../../../../src/kernel/ufs/ufs_vnops.c":2842, 0xfffffc00002ea500] 4 ufs_create(ndp = 0xffffffff8cf922f8, vap = 0xfffffc00002fe0a0) ["../../../..\ /src/kernel/ufs/ufs_vnops.c":602, 0xfffffc00002e771c] 5 vn_open(ndp = 0xffffffff8cf95d18, fmode = 4618, cmode = 416) ["../../../../s\ rc/kernel/vfs/vfs_vnops.c":258, 0xfffffc00002fe138] 6 copen(p = 0xffffffff8c6efba0, args = 0xffffffff8cf95e50, retval = 0xffffffff\ 8cf95e40, compat = 0) ["../../../../src/kernel/vfs/vfs_syscalls.c":1379, 0xfffffc\ 00002fb890] 7 open(p = 0xffffffff8cf95e40, args = (nil), retval = 0x7f4) ["../../../../src\ /kernel/vfs/vfs_syscalls.c":1340, 0xfffffc00002fb7bc] 8 syscall(ep = 0xffffffff8cf95ef8, code = 45) ["../../../../src/kernel/arch/al\ pha/syscall_trap.c":532, 0xfffffc00003cfa34] 9 _Xsyscall() ["../../../../src/kernel/arch/alpha/locore.s":703, 0xfffffc00003\ c31e0](dbx)
q
Display the panic string (panicstr
).
The panic
string shows that the
ialloc
function called the
panic
function.
[Return to example]
Perform
a stack trace.
This confirms that the
ialloc
function at
line 501 in file
ufs_alloc.c
called the
panic
function.
[Return to example]
The following example shows a method of finding
a software panic with the
kdbx
debugger:
#
kdbx -k vmunix.3 vmzcore.3
dbx version 5.0 Type 'help' for help. stopped at [boot:753 ,0xfffffc00003c4b04] Source not available(kdbx)
sum
[1] Hostname : system.dec.com cpu: Digital AlphaStation 600 5/266 avail: 1 Boot-time: Tue Oct 6 15:16:41 1998 Time: Tue Oct 27 13:52:11 1998 Kernel : OSF1 release V5.0 version 688.2 (alpha)(kdbx)
p panicstr
[2] 0xfffffc0000453ea0 = "wdir: compact2"(kdbx)
t
[3] > 0 boot(paniced = 0, arghowto = 0) ["../../../../src/kernel/arch/alpha/machdep\ .c":753, 0xfffffc00003c4b04] 1 panic(s = 0xfffffc00002e0938 = "p") ["../../../../src/kernel/bsd/subr_prf.c"\ :1119, 0xfffffc00002bdbb0] 2 direnter(ip = 0xffffffff00000000, ndp = 0xffffffff9d38db60) ["../../../../sr\ c/kernel/ufs/ufs_lookup.c":986, 0xfffffc00002e2adc] 3 ufs_mkdir(ndp = 0xffffffff9d38a2f8, vap = 0x100000020) ["../../../../src/ker\ nel/ufs/ufs_vnops.c":2383, 0xfffffc00002e9cbc] 4 mkdir(p = 0xffffffff9c43d7c0, args = 0xffffffff9d38de50, retval = 0xffffffff\ 9d38de40) ["../../../../src/kernel/vfs/vfs_syscalls.c":2579, 0xfffffc00002fd930] 5 syscall(ep = 0xffffffff9d38def8, code = 136) ["../../../../src/kernel/arch/a\ lpha/syscall_trap.c":532, 0xfffffc00003cfa54] 6 _Xsyscall() ["../../../../src/kernel/arch/alpha/locore.s":703, 0xfffffc00003\ c3200](kdbx)
q
dbx (pid 29939) died. Exiting...
Use the
sum
command to get a summary of the system.
[Return to example]
Display
the panic string (panicstr
).
[Return to example]
Perform a stack trace of the current thread block.
The stack trace shows that the
direnter
function, at line
986 in file
ufs_lookup.c
, called the
panic
function.
[Return to example]
Occasionally,
your system might crash due to a hardware error.
During a hardware exception,
the hardware encounters a situation from which it cannot continue.
For example,
the hardware might detect a parity error in a portion of memory that is necessary
for its successful operation.
When a hardware exception occurs, the hardware
stores information in registers and stops operation.
When control returns
to the software, it normally calls the
panic
function and
the system crashes.
The sections that follow show how to identify hardware traps using the
dbx
and
kdbx
debuggers.
You can also examine
output from the
crashdc
crash data collection tool to help
you determine the cause of a crash.
Sample output from
crashdc
is shown and explained in
Appendix A.
4.3.1 Using dbx to Determine the Cause of a Hardware Error
The following example shows a method for identifying
a hardware trap with the
dbx
debugger:
#
dbx -k vmunix.1 vmzcore.1
dbx version 5.0 Type 'help' for help.(dbx)
sh strings vmunix.1 | grep '(Rev'
[1] Tru64 UNIX V5.0-1 (Rev. 961); Wed Mar 18 16:12:36 EST 1999(dbx)
p utsname
[2] struct { sysname = "OSF1" nodename = "system.dec.com" release = "V5.0" version = "961" machine = "alpha" }(dbx)
p panicstr
[3] 0xfffffc0000489350 = "trap: Kernel mode prot fault\n"(dbx)
t
[4] > 0 boot(paniced = 0, arghowto = 0) ["/usr/sde/alpha/build/alpha.nightly/src/ker\ nel/arch/alpha/machdep.c": 1 panic(s = 0xfffffc0000489350 = "trap: Kernel mode prot fault\n") ["/usr/sde\ /alpha/build/alpha.nightly/src/kernel/bsd/subr_prf.c":1099, 0xfffffc00002c0730] 2 trap() ["/usr/sde/alpha/build/alpha.nightly/src/kernel/arch/alpha/trap.c":54\ 4, 0xfffffc00003e0c78] 3 _XentMM() ["/usr/sde/alpha/build/alpha.nightly/src/kernel/arch/alpha/locore.\ s":702, 0xfffffc00003d4ff4](dbx)
kps
[5] PID COMM 00000 kernel idle 00001 init 00002 device server 00003 exception hdlr 00663 ypbind 00018 cfgmgr 00219 automount
.
.
.
00265 cron 00293 xdm 02311 inetd 00278 lpd 01443 csh 01442 rlogind 01646 rlogind 01647 csh(dbx)
p $pid
[6] 2311(dbx)
p *pmsgbuf
[7] struct { msg_magic = 405601 msg_bufx = 62 msg_bufr = 3825 msg_bufc = "unknown flag printstate: unknown flag printstate: unknown flag de: table is full <3>vnode: table is full
.
.
.
<3>arp: local IP address 0xffffffff82b40429 in use by hardware address 08:00:2B:20:19:CD <3>arp: local IP address 0xffffffff82b40429 in use by hardware address 08:00:2B:2B:F6:3B va=0000000000000028, status word=0000000000000000, pc=fffffc000032972c panic: trap: Kernel mode prot fault syncing disks... 3 3 done printstate: unknown flag printstate: unknown flag printstate: unknown flag printstate: unknown flag printstate: u" }(dbx)
px savedefp
0xffffffff89b2b4e0(dbx)
p savedefp
0xffffffff89b2b4e0(dbx)
p savedefp[28]
18446739675666356012(dbx)
px savedefp[28]
[8] 0xfffffc000032972c(dbx)
savedefp[28]/i
[9] [nfs_putpage:2344, 0xfffffc000032972c] ldl r5, 40(r1)(dbx)
savedefp[23]/i
[10] [ubc_invalidate:1768, 0xfffffc0000315fe0] stl r0, 84(sp)(dbx)
func nfs_putpage
[11](dbx)
file
[12] /usr/sde/alpha/build/alpha.nightly/src/kernel/kern/sched_prim.c(dbx)
func ubc_invalidate
[13] ubc_invalidate: Source not available(dbx)
file
[14] /usr/sde/alpha/build/alpha.nightly/src/kernel/vfs/vfs_ubc.c(dbx)
q
You can use the
sh
command to enter commands
to the shell.
In this case, enter the
stings
and
grep
commands to pull the operating system revision number in the
vmunix.1
dump file.
[Return to example]
Display the
utsname
structure to obtain more information about the operating system
version.
[Return to example]
Display the panic string (panicstr
).
The
panic
function was called by a
trap
function.
[Return to example]
Perform a stack trace.
This confirms that the
trap
function called the
panic
function.
However,
the stack trace does not show what caused the trap.
[Return to example]
Look to
see what processes were running when the system crashed by entering the
kps
command.
[Return to example]
Look to see what the process ID (PID) was pointing to at the time
of the crash.
In this case, the PID was pointing to process 2311, which is
the
inetd
daemon, from the
kps
command
output.
[Return to example]
Display the preserved message
buffer (pmsgbuf
).
Note that this buffer contains the program
counter (pc) value, which is displayed in the following line:
va=0000000000000028, status word=0000000000000000, pc=fffffc000032972c
Display register 28 of the exception frame
pointer (savedefp
).
This register always contains the pc
value.
You can always obtain the pc value from either the preserved message
buffer or register 28 of the exception frame pointer.
[Return to example]
Disassemble the pc to determine its contents.
The pc at the time
of the crash contained the
nfs_putpage
function at line
2344.
[Return to example]
Disassemble the return address to determine its contents.
The
return value at the time of the crash contained the
ubc_invalidate
function at line 1768.
[Return to example]
Point the
dbx
debugger to the
nfs_putpage
function.
[Return to example]
Display the name of the source file that contains the
nfs_putpage
function.
[Return to example]
Point the
dbx
debugger to the
ubc_invalidate
function.
[Return to example]
Display the name of the source file that contains the
ubc_invalidate
function.
[Return to example]
ubc_invalidate
function, which resides in the
/vfs/vfs_ubc.c
file at line number 1768, called the
nfs_putpage
function
at line number 2344 in the
/kern/sched_prim.c
file and
the system stopped.
The following example shows a method for identifying
a hardware error with the
kdbx
debugger:
#
kdbx -k vmunix.5 vmzcore.5
dbx version 5.0 Type 'help' for help. stopped at [boot:753 ,0xfffffc00003c4b04] Source not available(kdbx)
sum
[1] Hostname : system.dec.com cpu: Digital AlphaStation 600 5/266 avail: 1 Boot-time: Tue Oct 6 15:16:41 1998 Time: Tue Oct 27 13:52:11 1998 Kernel : OSF1 release V5.0 version 688.2 (alpha)(kdbx)
p panicstr
[2] 0xfffffc0000471030 = "ECC Error"(kdbx)
t
[3] > 0 boot(paniced = 0, arghowto = 0) ["../../../../src/kernel/arch/alpha/machdep.\ c":753, 0xfffffc00003c4b04] 1 panic(s = 0x670) ["../../../../src/kernel/bsd/subr_prf.c":1119, 0xfffffc00002\ bdbb0] 2 kn15aa_machcheck(type = 1648, cmcf = 0xfffffc00000f8050 = , framep = 0xffff\ ffff94f79ef8) ["../../../../src/kernel/arch/alpha/hal/kn15aa.c":1269, 0xfffffc000\ 03da62c] 3 mach_error(type = -1795711240, phys_logout = 0x3, regs = 0x6) ["../../../../s\ rc/kernel/arch/alpha/hal/cpusw.c":323, 0xfffffc00003d7dc0] 4 _XentInt() ["../../../../src/kernel/arch/alpha/locore.s":609, 0xfffffc00003c3\ 148](kdbx)
q
dbx (pid 337) died. Exiting...
Use the
sum
command to get a summary of
the system.
[Return to example]
Display the panic string (panicstr
).
[Return to example]
Perform a stack trace.
Because the
kn15aa_machcheck
function (which is a hardware checking function) called the
panic
function, the system crash was probably the result of a hardware
error.
[Return to example]
The
dbx
and
kdbx
debuggers have the concept of the current thread.
In many
cases, when you invoke one of the debuggers to analyze a crash dump, the panic
string is in the current thread.
At times, however, the current thread contains
no panic string and so is probably not the thread that caused the crash.
The following example shows a method for stepping through kernel threads to identify the events that lead to the crash:
#
dbx -k ./vmunix.2 ./vmzcore.2
dbx version 5.0 Type 'help' for help. thread 0x8d431c68 stopped at [thread_block:1305 +0x114,0xfffffc000033961c] \ Source not available(dbx)
p panicstr
[1] 0xfffffc000048a0c8 = "kernel memory fault"(dbx)
t
[2] > 0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1305, 0xfffffc0\ e 00033961c] 1 mpsleep(chan = 0xffffffff8d4ef450 = , pri = 282, wmesg = 0xfffffc000046f\ 290 = "network", timo = 0, lockp = (nil), flags = 0) ["../../../../src/kernel/\ bsd/kern_synch.c":267, 0xfffffc00002b772c] 2 sosleep(so = 0xffffffff8d4ef408, addr = 0xffffffff906cfcf4 = "^P", pri = 2 \ 82,tmo = 0) ["../../../../src/kernel/bsd/uipc_socket2.c":612, 0xfffffc00002d3784] 3 accept1(p = 0xffffffff8f8bfde8, args = 0xffffffff906cfe50, retval = 0xffff \ ffff906cfe40, compat_43 = 1) ["../../../../src/kernel/bsd/uipc_syscalls.c":300 \ , 0xfffffc00002d4c74] 4 oaccept(p = 0xffffffff8d431c68, args = 0xffffffff906cfe50, retval = 0xffff \ ffff906cfe40) ["../../../../src/kernel/bsd/uipc_syscalls.c":250, 0xfffffc00002d\ 4b0c] 5 syscall(ep = 0xffffffff906cfef8, code = 99, sr = 1) ["../../../../src/kern \ el/arch/alpha/syscall_trap.c":499, 0xfffffc00003ec18c] 6 _Xsyscall() ["../../../../src/kernel/arch/alpha/locore.s":675, 0xfffffc000\ 03df96c](dbx)
tlist
[3] thread 0x8d431a60 stopped at [thread_block:1305 +0x114,0xfffffc000033961c] \ Source not available thread 0x8d431858 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available thread 0x8d431650 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available thread 0x8d431448 stopped at [thread_block:1305 +0x114,0xfffffc000033961c] \ Source not available thread 0x8d431240 stopped at [thread_block:1305 +0x114,0xfffffc000033961c] \ Source not available
.
.
.
thread 0x8d42f5d0 stopped at [boot:696 ,0xfffffc00003e119c] Source not \ available thread 0x8d42f3c8 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available thread 0x8d42f1c0 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available thread 0x8d42efb8 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available thread 0x8d42dd70 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available(dbx)
tset 0x8d42f5d0
[4] thread 0x8d42f5d0 stopped at [boot:696 ,0xfffffc00003e119c] Source not ava\ ilable(dbx)
t
[5] > 0 boot(paniced = 0, arghowto = 0) ["../../../../src/kernel/arch/alpha/mac\ hdep.c":694, 0xfffffc00003e1198] 1 panic(s = 0xfffffc000048a098 = " sp contents at time of fault: 0x%l01\ 6x\r\n\n") ["../../../../src/kernel/bsd/subr_prf.c":1110, 0xfffffc00002beef4] 2 trap() ["../../../../src/kernel/arch/alpha/trap.c":677, 0xfffffc00003ecc70] 3 _XentMM() ["../../../../src/kernel/arch/alpha/locore.s":828, 0xfffffc000\ 03dfb1c] 4 pmap_release_page(pa = 18446744071785586688) ["../../../../src/kernel/ar\ ch/alpha/pmap.c":640, 0xfffffc00003e3ecc] 5 put_free_ptepage(page = 5033216) ["../../../../src/kernel/arch/alpha/pma\ p.c" :534, 0xfffffc00003e3ca0] 6 pmap_destroy(map = 0xffffffff8d5bc428) ["../../../../src/kernel/arch/alp\ ha/p map.c":1891, 0xfffffc00003e6140] 7 vm_map_deallocate(map = 0xffffffff81930ee0) ["../../../../src/kernel/vm/\ vm_map.c":482, 0xfffffc00003d03c0] 8 task_deallocate(task = 0xffffffff8d568d48) ["../../../../src/kernel/kern\ /task.c":237, 0xfffffc000033c1dc] 9 thread_deallocate(thread = 0x4e4360) ["../../../../src/kernel/kern/threa\ d.c":689, 0xfffffc000033d83c] 10 reaper_thread() ["../../../../src/kernel/kern/thread.c":1952, 0xfffffc00\ 0033e920] 11 reaper_thread() ["../../../../src/kernel/kern/thread.c":1901, 0xfffffc00\ 0033e8ac](dbx)
q
Display the panic string (panicstr
) to view
the panic message, if any.
This message indicates that a memory fault occurred.
[Return to example]
Perform a stack trace of the current thread.
Because this thread
does not show a call to the
panic
function, you need to
look at other threads.
[Return to example]
Examine the system's threads.
The thread most likely to contain
the
panic
is the
boot
thread because
the
boot
function always executes immediately before the
system crashes.
If the
boot
thread does not exist, you
must examine every thread of every process in the process list.
[Return to example]
Point
dbx
to the
boot
thread at address
0x8d42f5d0
.
[Return to example]
In this example, the problem is in the
pmap_release_page
function at line 640 of the
pmap.c
file.
[Return to example]
If you are analyzing crash dump files from an SMP system, you must first determine on which CPU the panic occurred. You can then continue crash dump analysis as you would on a single processor system.
The following example shows a method for determining which CPU caused
the crash and which function called the
panic
function:
%
dbx -k ./vmunix.1 ./vmzcore.1
dbx version 5.0 Type 'help' for help. stopped at [boot:1494 ,0xfffffc0000442918] Source not available(dbx)
p ustsname
[1] struct { sysname = "OSF1" nodename = "system.dec.com" release = "V5.0" version = "688.2" machine = "alpha" }(dbx)
print paniccpu
[2]0
(dbx)
p machine_slot[1]
[3] struct { is_cpu = 1 cpu_type = 15 cpu_subtype = 3 running = 1 cpu_ticks = { [0] 416162 [1] 83260 [2] 1401080 [3] 11821212 [4] 1095581 } clock_freq = 1024 error_restart = 0 cpu_panicstr = 0xfffffc000059f6a0 = "cpu_ip_intr: panic request" cpu_panic_thread = 0xffffffff8109a780 }(dbx)
p panicstr
[4] 0xfffffc0000558ad0 = "simple_lock: uninitialized lock"(dbx)
tset active_threads[paniccpu]
[5]stopped at [boot:1494 ,0xfffffc0000442918]
(dbx)
t
[6] > 0 boot(0x0, 0x4, 0xac35c0000000a, 0xfffffc00004403fc, 0xfffffc000000000e) \ ["../../../../src/kernel/arch/alpha/machdep.c":1494, 0xfffffc0000442918] 1 panic(s = 0xfffffc0000558b40 = "simple_lock: hierarchy violation") ["../\ 2 simple_lock_fault(slp = 0xfffffc00006292f0, state = 0, caller = 0xfffffc\ 000046f384, arg = 0xfffffc0000534fd8 = "session.s_fpgrp_lock", fmt = 0xfffffc\ 0000558de8 = " class already locked: %s\n", error = 0xfffffc0000558b40 = "\ simple_lock: hierarchy violation") ["../../../../src/kernel/kern/lock.c":1558\ , 0xfffffc00003c34ec] 3 simple_lock_hierarchy_violation(slp = 0xfffffc000046f384, state = 184467\ 39675668500440, caller = 0xfffffc0000558de8, curhier = 5606208) ["../../../..\ /src/kernel/kern/lock.c":1616, 0xfffffc00003c3620] 4 xnaintr(0xfffffc00005a5158, 0x2, 0xffffffffb53ef238, 0xfffffc000068a754,\ 0xfffffc000055891d) ["../../../../src/kernel/io/dec/netif/if_xna.c":1077, 0x\ fffffc000046f384] 5 _XentInt(0x2, 0xfffffc0000447174, 0xfffffc00005b7d40, 0x2, 0x0) ["../../\ 6 swap_ipl(0x2, 0xfffffc0000447174, 0xfffffc00005b7d40, 0x2, 0x0) ["../../\ 7 boot(0x0, 0x0, 0xffffffffa52c6000, 0xffffffffb53ef1f8, 0xfffffc00003bf4f\ c) ["../../../../src/kernel/arch/alpha/machdep.c":1434, 0xfffffc000044280c] 8 panic(s = 0xfffffc0000558ad0 = "simple_lock: uninitialized lock") ["../.\ 9 simple_lock_fault(slp = 0xffffffffa52c6000, state = 1719, caller = 0xfff\ ffc00003734c4, arg = (nil), fmt = (nil), error = 0xfffffc0000558ad0 = "simple\ _lock: uninitialized lock") ["../../../../src/kernel/kern/lock.c":1558, 0xfff\ ffc00003c34ec] 10 simple_lock_valid_violation(slp = 0xfffffc00003734c4, state = 0, caller \ = (nil)) ["../../../../src/kernel/kern/lock.c":1584, 0xfffffc00003c3578] 11 pgrp_ref(0xffffffffa52c6000, 0x0, 0xfffffc000023ee20, 0x6b7, 0xfffffc000\ 05e1080) ["../../../../src/kernel/bsd/kern_proc.c":561, 0xfffffc00003734c4] 12 exit(0xffffffffb53ef740, 0x100, 0x1, 0xffffffffa42e5e80, 0x1) ["../../..\ /../src/kernel/bsd/kern_exit.c":868, 0xfffffc000023ef30] 13 rexit(0xffffffff814d2d80, 0xffffffffb53ef758, 0xffffffffb53ef8b8, 0x1000\ 00001, 0x0) ["../../../../src/kernel/bsd/kern_exit.c":546, 0xfffffc000023e7dc] 14 syscall(0xffffffffb53ec000, 0xfffffc000068a300, 0x0, 0x51, 0x1) ["../../\ 15 _Xsyscall(0x8, 0x3ff800e6938, 0x14000d0f0, 0x1, 0x11ffffc18) ["../../../\(dbx)
p *pmsgbuf
[7] struct { msg_magic = 405601 msg_bufx = 701 msg_bufr = 134 msg_bufc = "0.64.143, errno 22 NFS server: stale file handle fs(742,645286) file 573 gen 32779 getattr, client address = 16.140.64.143, errno 22 simple_lock: uninitialized lock pc of caller: 0xfffffc00003734c4 lock address: 0xffffffffa52c6000 lock class name: (unknown_simple_lock) current lock state: 0x00000000e0e9b04a (cpu=0,pc=0xfffffc00e0e9b048,free) panic (cpu 0): simple_lock: uninitialized lock simple_lock: hierarchy violation pc of caller: 0xfffffc000046f384 lock address: 0xfffffc00006292f0 lock info addr: 0xfffffc0000672cc0 lock class name: xna_softc.lk_xna_softc class already locked: session.s_fpgrp_lock
.
.
.
}(dbx)
quit
Display the
ustname
structure to obtain
information about the system.
[Return to example]
Display the number of the CPU on which the panic occurred, in this case CPU 0 was the CPU that started the system panic. [Return to example]
Display
the
machine_slot
structure for a CPU other than the one
that started the system panic.
Notice that the panic string contains:
cpu_ip_intro: panic_request
This panic string indicates that this CPU was not the one that started the system panic. This CPU was requested to panic and stop operation. [Return to example]
Display the panic string, which in this case indicates that a process attempted to obtain an uninitialized lock. [Return to example]
Set the context to the CPU that caused the system panic to begin. [Return to example]
Perform a stack trace on the CPU that started the system panic.
Notice that the
panic
function appears twice in the
stack trace.
The series of events that resulted in the first call to the
panic
function caused the crash.
The events that occurred after
the first call to the
panic
function were performed after
the system was corrupt and during an attempt to save data.
Normally, any
events that occur after the initial call to the
panic
function
will not help you determine why the system crashed.
In this example, the problem is in the
pgrp_ref
function
on line 561 in the
kern_proc.c
file.
If you follow the stack trace after the
pgrp_ref
function, you can see that the
pgrp_ref
function calls
the
simple_lock_valid_violation
function.
This function
displays information about simple locks, which might be helpful in determining
why the system crashed.
[Return to example]
Retrieve the information from the
simple_lock_valid_violation
function by displaying the preserved message buffer.
[Return to example]