[Return to Library]  [TOC]  [PREV]  SECT--  [NEXT]  [INDEX] [Help]

5    Crash Analysis Examples

Finding problems in crash dump files is a task that takes practice and experience to do well. Exactly how you determine what caused a crash varies depending on how the system crashed. The cause of some crashes is relatively easy to determine, while finding the cause of other crashes is difficult and time-consuming.

This chapter helps you analyze crash dump files by providing the following information:

For information about how crash dump files are created, see Chapter 4.


[Return to Library]  [TOC]  [PREV]  SECT--  [NEXT]  [INDEX] [Help]

5.1    Guidelines for Examining Crash Dump Files

In examining crash dump files, there is no one way to determine the cause of a system crash. However, following these steps should help you identify the events that lead to most crashes:

  1. Gather some facts about the system; for example, operating system type, version number, revision level, hardware configuration.

  2. Locate the thread executing at the time of the crash. Most likely, this thread contains the events that lead to the panic.

  3. Look at the panic string, if one exists. This string is contained in the preserved message buffer (pmsgbuf) and in the panicstr global variable. The panic string gives a reason for the crash.

  4. Identify the function that called the panic or trap function. That function is the one that caused the system to crash.

  5. Examine the source code for the function that caused the crash to infer the error that caused the crash. You might also need to examine related data structures and functions that appear earlier in the stack. An earlier function might have passed corrupt data to the function that caused a crash.

  6. Determine whether you can fix the problem.

    If the system crashed because of a hardware problem (for example, because a memory board became corrupt), correcting the problem probably requires repairing or replacing the hardware. You might be able to disconnect the hardware that caused the problem and operate without it until it is repaired or replaced. If you need to repair or replace Digital hardware, call the nearest Digital service center or sales office.

    If a software panic caused the crash, you can fix the problem if it is in software you or someone else at your company wrote. Otherwise, you must request that the producer of the software fix the problem. If the problem is in software from Digital, you file a Software Performance Report (SPR) to request a correction to the Digital software.

    For information about reporting problems to Digital, contact your local Digital service center or sales office.


[Return to Library]  [TOC]  [PREV]  --SECT  SECT--  [NEXT]  [INDEX] [Help]

5.2    Identifying a Crash Caused by a Software Problem

When software encounters a state from which it cannot continue, it calls the system panic function. For example, if the software attempts to access an area of memory that is protected from access, the software might call the panic function and crash the system.

In most cases, only system programmers can fix the problem that caused a panic because most panics are caused by software errors. However, some system panics reflect other problems. For example, if a memory board becomes corrupted, software that attempts to write to that board might call the panic function and crash the system. In this case, the solution might be to replace the memory board and reboot the system.

The sections that follow demonstrate finding the cause of a software panic using the dbx and kdbx debuggers. You can also examine output from the crashdc crash data collection tool to help you determine the cause of a crash. Sample output from crashdc is shown and explained in Appendix A.


[Return to Library]  [TOC]  [PREV]  --SECT  SECT--  [NEXT]  [INDEX] [Help]

5.2.1    Using dbx to Determine the Cause of a Software Panic

The following example shows a method for identifying a software panic with the dbx debugger:

# dbx -k vmunix.0 vmcore.0
dbx version 3.11.1
Type 'help' for help.

stopped at  [boot:753 ,0xfffffc00003c4ae4]	 Source not available


(dbx) p panicstr     (1)
0xfffffc000044b648 = "ialloc: dup alloc"

(dbx) t              (2)
>  0 boot(paniced = 0, arghowto = 0) ["../../../../src/kernel/arch/alpha/machdep.\
c":753, 0xfffffc00003c4ae4]
   1 panic(s = 0xfffffc000044b618 = "mode = 0%o, inum = %d, pref = %d fs = %s\n")\
 ["../../../../src/kernel/bsd/subr_prf.c":1119, 0xfffffc00002bdbb0]
   2 ialloc(pip = 0xffffffff8c6acc40, ipref = 57664, mode = 0, ipp = 0xffffffff8c\
f95af8) ["../../../../src/kernel/ufs/ufs_alloc.c":501, 0xfffffc00002dab48]
   3 maknode(vap = 0xffffffff8cf95c50, ndp = 0xffffffff8cf922f8, ipp = 0xffffffff\
8cf95b60) ["../../../../src/kernel/ufs/ufs_vnops.c":2842, 0xfffffc00002ea500]
   4 ufs_create(ndp = 0xffffffff8cf922f8, vap = 0xfffffc00002fe0a0) ["../../../..\
/src/kernel/ufs/ufs_vnops.c":602, 0xfffffc00002e771c]
   5 vn_open(ndp = 0xffffffff8cf95d18, fmode = 4618, cmode = 416) ["../../../../s\
rc/kernel/vfs/vfs_vnops.c":258, 0xfffffc00002fe138]
   6 copen(p = 0xffffffff8c6efba0, args = 0xffffffff8cf95e50, retval = 0xffffffff\
8cf95e40, compat = 0) ["../../../../src/kernel/vfs/vfs_syscalls.c":1379, 0xfffffc\
00002fb890]
   7 open(p = 0xffffffff8cf95e40, args = (nil), retval = 0x7f4) ["../../../../src\
/kernel/vfs/vfs_syscalls.c":1340, 0xfffffc00002fb7bc]
   8 syscall(ep = 0xffffffff8cf95ef8, code = 45) ["../../../../src/kernel/arch/al\
pha/syscall_trap.c":532, 0xfffffc00003cfa34]
   9 _Xsyscall() ["../../../../src/kernel/arch/alpha/locore.s":703, 0xfffffc00003\
c31e0]

(dbx) q

  1. Display the panic string (panicstr). The panic string shows that the ialloc function called the panic function.

  2. Perform a stack trace. This confirms that the ialloc function at line 501 in file ufs_alloc.c called the panic function.


[Return to Library]  [TOC]  [PREV]  --SECT  SECT--  [NEXT]  [INDEX] [Help]

5.2.2    Using kdbx to Determine the Cause of a Software Panic

The following example shows a method of finding a software panic using the kdbx debugger:

# kdbx -k vmunix.3 vmcore.3
dbx version 3.11.1
Type 'help' for help.

stopped at  [boot:753 ,0xfffffc00003c4b04]	 Source not available


(kdbx) sum        (1)
Hostname : system.dec.com
cpu: DEC3000 - M500       avail: 1
Boot-time:      Mon Dec 14 12:06:31 1992
Time:   Mon Dec 14 12:17:16 1992
Kernel : OSF1 release 1.2 version 1.2 (alpha)

(kdbx) p panicstr (2)
0xfffffc0000453ea0 = "wdir: compact2"

(kdbx) t          (3) 
>  0 boot(paniced = 0, arghowto = 0) ["../../../../src/kernel/arch/alpha/machdep\
.c":753, 0xfffffc00003c4b04]
  1 panic(s = 0xfffffc00002e0938 = "p") ["../../../../src/kernel/bsd/subr_prf.c"\
:1119, 0xfffffc00002bdbb0]
  2 direnter(ip = 0xffffffff00000000, ndp = 0xffffffff9d38db60) ["../../../../sr\
c/kernel/ufs/ufs_lookup.c":986, 0xfffffc00002e2adc]
  3 ufs_mkdir(ndp = 0xffffffff9d38a2f8, vap = 0x100000020) ["../../../../src/ker\
nel/ufs/ufs_vnops.c":2383, 0xfffffc00002e9cbc]
  4 mkdir(p = 0xffffffff9c43d7c0, args = 0xffffffff9d38de50, retval = 0xffffffff\
9d38de40) ["../../../../src/kernel/vfs/vfs_syscalls.c":2579, 0xfffffc00002fd930]
  5 syscall(ep = 0xffffffff9d38def8, code = 136) ["../../../../src/kernel/arch/a\
lpha/syscall_trap.c":532, 0xfffffc00003cfa54]
  6 _Xsyscall() ["../../../../src/kernel/arch/alpha/locore.s":703, 0xfffffc00003\
c3200]

(kdbx) q
dbx (pid 29939) died.  Exiting...

  1. Use the sum command to get a summary of the system.

  2. Display the panic string (panicstr).

  3. Perform a stack trace of the current thread block. The stack trace shows that the direnter function, at line 986 in file ufs_lookup.c, called the panic function.


[Return to Library]  [TOC]  [PREV]  --SECT  SECT--  [NEXT]  [INDEX] [Help]

5.3    Identifying a Hardware Exception

Occasionally, your system might crash due to a hardware error. During a hardware exception, the hardware encounters a situation from which it cannot continue. For example, the hardware might detect a parity error in a portion of memory that is necessary for its successful operation. When a hardware exception occurs, the hardware stores information in registers and stops operation. When control returns to the software, it normally calls the panic function and the system crashes.

The sections that follow show how to identify hardware traps using the dbx and kdbx debuggers. You can also examine output from the crashdc crash data collection tool to help you determine the cause of a crash. Sample output from crashdc is shown and explained in Appendix A.


[Return to Library]  [TOC]  [PREV]  --SECT  SECT--  [NEXT]  [INDEX] [Help]

5.3.1    Using dbx to Determine the Cause of a Hardware Error

The following example shows a method for identifying a hardware trap with the dbx debugger:

# dbx -k vmunix.1 vmcore.1
dbx version 3.11.1
Type 'help' for help.

(dbx) sh strings vmunix.1 | grep '(Rev'      (1)
DEC OSF/1 X2.0A-7  (Rev. 1);


(dbx) p utsname        (2)
struct {
    sysname = "OSF1"
    nodename = "system.dec.com"
    release = "2.0"
    version = "2.0"
    machine = "alpha"
}


(dbx) p panicstr       (3)
0xfffffc0000489350 = "trap: Kernel mode prot fault\n"


(dbx) t                (4)
>  0 boot(paniced = 0, arghowto = 0) ["/usr/sde/alpha/build/alpha.nightly/src/ker\
nel/arch/alpha/machdep.c":
    1 panic(s = 0xfffffc0000489350 = "trap: Kernel mode prot fault\n") ["/usr/sde\
/alpha/build/alpha.nightly/src/kernel/bsd/subr_prf.c":1099, 0xfffffc00002c0730]
   2 trap() ["/usr/sde/alpha/build/alpha.nightly/src/kernel/arch/alpha/trap.c":54\
4, 0xfffffc00003e0c78]
   3 _XentMM() ["/usr/sde/alpha/build/alpha.nightly/src/kernel/arch/alpha/locore.\
s":702, 0xfffffc00003d4ff4]


(dbx) kps              (5)
  PID   COMM
00000   kernel idle
00001   init
00002   device server
00003   exception hdlr
00663   ypbind
00018   cfgmgr
00219   automount

.
.
.
00265 cron 00293 xdm 02311 inetd 00278 lpd 01443 csh 01442 rlogind 01646 rlogind 01647 csh (dbx) p $pid (6) 2311 (dbx) p *pmsgbuf (7) struct { msg_magic = 405601 msg_bufx = 62 msg_bufr = 3825 msg_bufc = "nknown flag printstate: unknown flag printstate: unknown flag de: table is full <3>vnode: table is full
.
.
.
<3>arp: local IP address 0xffffffff82b40429 in use by hardware address 08:00:2B:20:19:CD <3>arp: local IP address 0xffffffff82b40429 in use by hardware address 08:00:2B:2B:F6:3B va=0000000000000028, status word=0000000000000000, pc=fffffc000032972c panic: trap: Kernel mode prot fault syncing disks... 3 3 done printstate: unknown flag printstate: unknown flag printstate: unknown flag printstate: unknown flag printstate: u" } (dbx) px savedefp 0xffffffff89b2b4e0 (dbx) p savedefp 0xffffffff89b2b4e0 (dbx) p savedefp[28] 18446739675666356012 (dbx) px savedefp[28] (8) 0xfffffc000032972c (dbx) savedefp[28]/i (9) [nfs_putpage:2344, 0xfffffc000032972c] ldl r5, 40(r1) (dbx) savedefp[23]/i (10) [ubc_invalidate:1768, 0xfffffc0000315fe0] stl r0, 84(sp) (dbx) func nfs_putpage (11) (dbx) file (12) /usr/sde/alpha/build/alpha.nightly/src/kernel/kern/sched_prim.c (dbx) func ubc_invalidate (13) ubc_invalidate: Source not available (dbx) file (14) /usr/sde/alpha/build/alpha.nightly/src/kernel/vfs/vfs_ubc.c (dbx) q

  1. You can use the sh command to enter commands to the shell. In this case, enter the stings and grep commands to pull the operating system revision number in the vmunix.1 dump file.

  2. Display the utsname structure to obtain more information about the operating system version.

  3. Display the panic string (panicstr). The panic function was called by a trap function.

  4. Perform a stack trace. This confirms that the trap function called the panic function. However, the stack trace does not show what caused the trap.

  5. Look to see what processes were running when the system crashed by entering the kps command.

  6. Look to see what the process ID (PID) was pointing to at the time of the crash. In this case, the PID was pointing to process 2311, which is the inetd daemon, from the kps command output.

  7. Display the preserved message buffer (pmsgbuf). Note that this buffer contains the program counter (pc) value, which is displayed in the following line:
    va=0000000000000028, status word=0000000000000000, pc=fffffc000032972c

  8. Display register 28 of the exception frame pointer (savedefp). This register always contains the pc value. You can always obtain the pc value from either the preserved message buffer or register 28 of the exception frame pointer.

  9. Disassemble the pc to determine its contents. The pc at the time of the crash contained the nfs_putpage function at line 2344.

  10. Disassemble the return address to determine its contents. The return value at the time of the crash contained the ubc_invalidate function at line 1768.

  11. Point the dbx debugger to the nfs_putpage function.

  12. Display the name of the source file that contains the nfs_putpage function.

  13. Point the dbx debugger to the ubc_invalidate function.

  14. Display the name of the source file that contains the ubc_invalidate function.

The result from this example shows that the ubc_invalidate function, which resides in the /vfs/vfs_ubc.c file at line number 1768, called the nfs_putpage function at line number 2344 in the /kern/sched_prim.c file and the system stopped.


[Return to Library]  [TOC]  [PREV]  --SECT  SECT--  [NEXT]  [INDEX] [Help]

5.3.2    Using kdbx to Determine the Cause of a Hardware Error

The following example shows a method for identifying a hardware error by using the kdbx debugger:

# kdbx -k vmunix.5 vmcore.5
dbx version 3.11.1
Type 'help' for help.

stopped at  [boot:753 ,0xfffffc00003c4b04]	 Source not available

(kdbx) sum            (1)
Hostname : system.dec.com
cpu: DEC3000 - M500     avail: 1
Boot-time:      Thu Jan  7 08:12:30 1993
Time:   Thu Jan  7 08:13:23 1993
Kernel : OSF1 release 1.2 version 1.2 (alpha)

(kdbx) p panicstr     (2)
0xfffffc0000471030 = "ECC Error"

(kdbx) t              (3)
>  0 boot(paniced = 0, arghowto = 0) ["../../../../src/kernel/arch/alpha/machdep.\
c":753, 0xfffffc00003c4b04]
  1 panic(s = 0x670) ["../../../../src/kernel/bsd/subr_prf.c":1119, 0xfffffc00002\
bdbb0]
  2 kn15aa_machcheck(type = 1648, cmcf = 0xfffffc00000f8050 = , framep = 0xffff\
ffff94f79ef8) ["../../../../src/kernel/arch/alpha/hal/kn15aa.c":1269, 0xfffffc000\
03da62c]
  3 mach_error(type = -1795711240, phys_logout = 0x3, regs = 0x6) ["../../../../s\
rc/kernel/arch/alpha/hal/cpusw.c":323, 0xfffffc00003d7dc0]
  4 _XentInt() ["../../../../src/kernel/arch/alpha/locore.s":609, 0xfffffc00003c3\
148]

(kdbx) q
dbx (pid 337) died.  Exiting...

  1. Use the sum command to get a summary of the system.

  2. Display the panic string (panicstr).

  3. Perform a stack trace. Because the kn15aa_machcheck function (which is a hardware checking function) called the panic function, the system crash was probably the result of a hardware error.


[Return to Library]  [TOC]  [PREV]  --SECT  SECT--  [NEXT]  [INDEX] [Help]

5.4    Finding a Panic String in a Thread Other Than the Current Thread

The dbx and kdbx debuggers have the concept of the current thread. In many cases, when you invoke one of the debuggers to analyze a crash dump, the panic string is in the current thread. At times, however, the current thread contains no panic string and so is probably not the thread that caused the crash.

The following example shows a method for stepping through kernel threads to identify the events that lead to the crash:


# dbx -k ./vmunix.2 ./vmcore.2
dbx version 3.11.1
Type 'help' for help.
thread 0x8d431c68 stopped at  [thread_block:1305 +0x114,0xfffffc000033961c]   \
 Source not available

(dbx) p panicstr        (1)
0xfffffc000048a0c8 = "kernel memory fault"

(dbx) t                 (2)

>  0 thread_block() ["../../../../src/kernel/kern/sched_prim.c":1305, 0xfffffc0\
e
00033961c]
   1 mpsleep(chan = 0xffffffff8d4ef450 = , pri = 282, wmesg = 0xfffffc000046f\
290 = "network", timo = 0, lockp = (nil), flags = 0) ["../../../../src/kernel/\
bsd/kern_synch.c":267, 0xfffffc00002b772c]
   2 sosleep(so = 0xffffffff8d4ef408, addr = 0xffffffff906cfcf4 = "^P", pri = 2 \
82,tmo = 0) ["../../../../src/kernel/bsd/uipc_socket2.c":612, 0xfffffc00002d3784]
   3 accept1(p = 0xffffffff8f8bfde8, args = 0xffffffff906cfe50, retval = 0xffff \
ffff906cfe40, compat_43 = 1) ["../../../../src/kernel/bsd/uipc_syscalls.c":300 \
, 0xfffffc00002d4c74]
   4 oaccept(p = 0xffffffff8d431c68, args = 0xffffffff906cfe50, retval = 0xffff \
ffff906cfe40) ["../../../../src/kernel/bsd/uipc_syscalls.c":250, 0xfffffc00002d\
4b0c]
   5 syscall(ep = 0xffffffff906cfef8, code = 99, sr = 1) ["../../../../src/kern \
el/arch/alpha/syscall_trap.c":499, 0xfffffc00003ec18c]
   6 _Xsyscall() ["../../../../src/kernel/arch/alpha/locore.s":675, 0xfffffc000\
03df96c]

(dbx) tlist             (3)
thread 0x8d431a60 stopped at   [thread_block:1305 +0x114,0xfffffc000033961c]   \
Source not available
thread 0x8d431858 stopped at   [thread_block:1289 +0x18,0xfffffc00003394b8]    \
Source not available
thread 0x8d431650 stopped at   [thread_block:1289 +0x18,0xfffffc00003394b8]    \
Source not available
thread 0x8d431448 stopped at   [thread_block:1305 +0x114,0xfffffc000033961c]   \
Source not available
thread 0x8d431240 stopped at   [thread_block:1305 +0x114,0xfffffc000033961c]   \
Source not available

.
.
.
thread 0x8d42f5d0 stopped at [boot:696 ,0xfffffc00003e119c] Source not \ available thread 0x8d42f3c8 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available thread 0x8d42f1c0 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available thread 0x8d42efb8 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available thread 0x8d42dd70 stopped at [thread_block:1289 +0x18,0xfffffc00003394b8] \ Source not available (dbx) tset 0x8d42f5d0 (4) thread 0x8d42f5d0 stopped at [boot:696 ,0xfffffc00003e119c] Source not ava\ ilable (dbx) t (5) > 0 boot(paniced = 0, arghowto = 0) ["../../../../src/kernel/arch/alpha/mac\ hdep.c":694, 0xfffffc00003e1198] 1 panic(s = 0xfffffc000048a098 = " sp contents at time of fault: 0x%l01\ 6x\r\n\n") ["../../../../src/kernel/bsd/subr_prf.c":1110, 0xfffffc00002beef4] 2 trap() ["../../../../src/kernel/arch/alpha/trap.c":677, 0xfffffc00003ecc70] 3 _XentMM() ["../../../../src/kernel/arch/alpha/locore.s":828, 0xfffffc000\ 03dfb1c] 4 pmap_release_page(pa = 18446744071785586688) ["../../../../src/kernel/ar\ ch/alpha/pmap.c":640, 0xfffffc00003e3ecc] 5 put_free_ptepage(page = 5033216) ["../../../../src/kernel/arch/alpha/pma\ p.c" :534, 0xfffffc00003e3ca0] 6 pmap_destroy(map = 0xffffffff8d5bc428) ["../../../../src/kernel/arch/alp\ ha/p map.c":1891, 0xfffffc00003e6140] 7 vm_map_deallocate(map = 0xffffffff81930ee0) ["../../../../src/kernel/vm/\ vm_map.c":482, 0xfffffc00003d03c0] 8 task_deallocate(task = 0xffffffff8d568d48) ["../../../../src/kernel/kern\ /task.c":237, 0xfffffc000033c1dc] 9 thread_deallocate(thread = 0x4e4360) ["../../../../src/kernel/kern/threa\ d.c":689, 0xfffffc000033d83c] 10 reaper_thread() ["../../../../src/kernel/kern/thread.c":1952, 0xfffffc00\ 0033e920] 11 reaper_thread() ["../../../../src/kernel/kern/thread.c":1901, 0xfffffc00\ 0033e8ac] (dbx) q

  1. Display the panic string (panicstr) to view the panic message, if any. This message indicates that a memory fault occurred.

  2. Perform a stack trace of the current thread. Because this thread does not show a call to the panic function, you need to look at other threads.

  3. Examine the system's threads. The thread most likely to contain the panic is the boot thread because the boot function always executes immediately before the system crashes. If the boot thread does not exist, you must examine every thread of every process in the process list.

  4. Point dbx to the boot thread at address 0x8d42f5d0.

  5. In this example, the problem is in the pmap_release_page function at line 640 of the pmap.c file.


[Return to Library]  [TOC]  [PREV]  --SECT  SECT--  [NEXT]  [INDEX] [Help]

5.5    Identifying the Cause of a Crash on an SMP System

If you are analyzing crash dump files from an SMP system, you must first determine on which CPU the panic occurred. You can then continue crash dump analysis as you would on a single processor system.

The following example shows a method for determining which CPU caused the crash and which function called the panic function:


% dbx -k ./vmunix.1 ./vmcore.1
dbx version 3.11.6
Type 'help' for help.
stopped at  [boot:1494 ,0xfffffc0000442918]	 Source not available

(dbx) p ustsname   (1)
struct {
    sysname = "OSF1"
    nodename = "wasted.zk3.dec.com"
    release = "V3.0"
    version = "358"
    machine = "alpha"
}


(dbx) print paniccpu   (2)

0 

(dbx) p machine_slot[1] (3)
struct {
    is_cpu = 1
    cpu_type = 15
    cpu_subtype = 3
    running = 1
    cpu_ticks = {
        [0] 416162
        [1] 83260
        [2] 1401080
        [3] 11821212
        [4] 1095581
    }
    clock_freq = 1024
    error_restart = 0
    cpu_panicstr = 0xfffffc000059f6a0 = "cpu_ip_intr: panic request"
    cpu_panic_thread = 0xffffffff8109a780
}


(dbx) p panicstr   (4)
0xfffffc0000558ad0 = "simple_lock: uninitialized lock"

(dbx) tset active_threads[paniccpu]   (5)

stopped at  [boot:1494 ,0xfffffc0000442918]

(dbx) t   (6)
>  0 boot(0x0, 0x4, 0xac35c0000000a, 0xfffffc00004403fc, 0xfffffc000000000e) \
["../../../../src/kernel/arch/alpha/machdep.c":1494, 0xfffffc0000442918]
   1 panic(s = 0xfffffc0000558b40 = "simple_lock: hierarchy violation") ["../\
   2 simple_lock_fault(slp = 0xfffffc00006292f0, state = 0, caller = 0xfffffc\
000046f384, arg = 0xfffffc0000534fd8 = "session.s_fpgrp_lock", fmt = 0xfffffc\
0000558de8 = "    class already locked: %s\n", error = 0xfffffc0000558b40 = "\
simple_lock: hierarchy violation") ["../../../../src/kernel/kern/lock.c":1558\
, 0xfffffc00003c34ec]
   3 simple_lock_hierarchy_violation(slp = 0xfffffc000046f384, state = 184467\
39675668500440, caller = 0xfffffc0000558de8, curhier = 5606208) ["../../../..\
/src/kernel/kern/lock.c":1616, 0xfffffc00003c3620]
   4 xnaintr(0xfffffc00005a5158, 0x2, 0xffffffffb53ef238, 0xfffffc000068a754,\
 0xfffffc000055891d) ["../../../../src/kernel/io/dec/netif/if_xna.c":1077, 0x\
fffffc000046f384]
   5 _XentInt(0x2, 0xfffffc0000447174, 0xfffffc00005b7d40, 0x2, 0x0) ["../../\
   6 swap_ipl(0x2, 0xfffffc0000447174, 0xfffffc00005b7d40, 0x2, 0x0) ["../../\
   7 boot(0x0, 0x0, 0xffffffffa52c6000, 0xffffffffb53ef1f8, 0xfffffc00003bf4f\
c) ["../../../../src/kernel/arch/alpha/machdep.c":1434, 0xfffffc000044280c]
   8 panic(s = 0xfffffc0000558ad0 = "simple_lock: uninitialized lock") ["../.\
   9 simple_lock_fault(slp = 0xffffffffa52c6000, state = 1719, caller = 0xfff\
ffc00003734c4, arg = (nil), fmt = (nil), error = 0xfffffc0000558ad0 = "simple\
_lock: uninitialized lock") ["../../../../src/kernel/kern/lock.c":1558, 0xfff\
ffc00003c34ec]
  10 simple_lock_valid_violation(slp = 0xfffffc00003734c4, state = 0, caller \
= (nil)) ["../../../../src/kernel/kern/lock.c":1584, 0xfffffc00003c3578]
  11 pgrp_ref(0xffffffffa52c6000, 0x0, 0xfffffc000023ee20, 0x6b7, 0xfffffc000\
05e1080) ["../../../../src/kernel/bsd/kern_proc.c":561, 0xfffffc00003734c4]
  12 exit(0xffffffffb53ef740, 0x100, 0x1, 0xffffffffa42e5e80, 0x1) ["../../..\
/../src/kernel/bsd/kern_exit.c":868, 0xfffffc000023ef30]
  13 rexit(0xffffffff814d2d80, 0xffffffffb53ef758, 0xffffffffb53ef8b8, 0x1000\
00001, 0x0) ["../../../../src/kernel/bsd/kern_exit.c":546, 0xfffffc000023e7dc]
  14 syscall(0xffffffffb53ec000, 0xfffffc000068a300, 0x0, 0x51, 0x1) ["../../\
  15 _Xsyscall(0x8, 0x3ff800e6938, 0x14000d0f0, 0x1, 0x11ffffc18) ["../../../\

(dbx) p *pmsgbuf  (7)
struct {
    msg_magic = 405601
    msg_bufx = 701
    msg_bufr = 134
    msg_bufc = "0.64.143, errno 22
NFS server: stale file handle fs(742,645286) file 573 gen 32779
 getattr, client address = 16.140.64.143, errno 22

simple_lock: uninitialized lock

    pc of caller:         0xfffffc00003734c4
    lock address:         0xffffffffa52c6000
    lock class name:      (unknown_simple_lock)
    current lock state:   0x00000000e0e9b04a (cpu=0,pc=0xfffffc00e0e9b048,free)

panic (cpu 0): simple_lock: uninitialized lock

simple_lock: hierarchy violation

    pc of caller:         0xfffffc000046f384
    lock address:         0xfffffc00006292f0
    lock info addr:       0xfffffc0000672cc0
    lock class name:      xna_softc.lk_xna_softc
    class already locked: session.s_fpgrp_lock

.
.
.
} (dbx) quit

  1. Display the ustname structure to obtain information about the system.

  2. Display the number of the CPU on which the panic occurred, in this case CPU 0 was the CPU that started the system panic.

  3. Display the machine_slot structure for a CPU other than the one that started the system panic. Notice that the panic string contains:
    cpu_ip_intro: panic_request

    This panic string indicates that this CPU was not the one that started the system panic. This CPU was requested to panic and stop operation.

  4. Display the panic string, which in this case indicates that a process attempted to obtain an uninitialized lock.

  5. Set the context to the CPU that caused the system panic to begin.

  6. Perform a stack trace on the CPU that started the system panic.

    Notice that the panic function appears twice in the stack trace. The series of events that resulted in the first call to the panic function caused the crash. The events that occurred after the first call to the panic function were performed after the system was corrupt and during an attempt to save data. Normally, any events that occur after the initial call to the panic function will not help you determine why the system crashed.

    In this example, the problem is in the pgrp_ref function on line 561 in the kern_proc.c file.

    If you follow the stack trace after the pgrp_ref function, you can see that the pgrp_ref function calls the simple_lock_valid_violation function. This function displays information about simple locks, which might be helpful in determining why the system crashed.

  7. Retrieve the information from the simple_lock_valid_violation function by displaying the preserved message buffer.