To ensure that you can analyze crash dump files following a system crash, you must understand how crash dump files are created. You must reserve space on disks for the crash dump and crash dump files. The amount of space you reserve depends on your system configuration and the type of crash dump you want the system to perform.
This chapter gives the following information to help you manage crash dumps and crash dump files:
For information about analyzing the contents of crash dump files, see Chapter 5.
Before the system writes a crash dump, it determines how the dump fits into the swap partitions. The following list describes how the system determines where to write the crash dump:
If the aggregate size of all the swap partitions is too small to contain the crash dump, the system creates no crash dump. Note
Each crash dump contains a header, which the system always writes to the end of the primary swap partition. The header contains information about the size of the dump and where the dump is stored. This information allows the system to find and save the dump at system reboot time.
You can configure the system so that it fills the secondary swap partitions with dump information before writing any information (except the dump header) to the primary swap partition. The attribute that you use to configure where crash dumps are written first is the dump_sp_threshold attribute.
Figure 4-1 shows the default setting of the dump_sp_threshold attribute for a 40 MB swap partition.
The system can write 38 MB of dump information to the primary swap partion shown in Figure 4-1. Therefore, a 30 MB dump fits on the primary swap partition and is written to that partition. However, a 40 MB dump is too large; the system writes the crash dump header to the end of the primary swap partition and writes the rest of the crash dump to secondary swap partitions.
Setting the dump_sp_threshold attribute to a high value causes the system to fill the secondary swap partitions before it writes dump information to the primary swap partion. For example, if you set the dump_sp_threshold attribute to a value that is equal to the size of the primary swap partition, the system fills the secondary swap partitions first. (Setting the dump_sp_threshold attribute is described in Section 4.3.3.) Figure 4-2 illustrates how a crash dump is written to secondary swap partitions on multiple devices.
If the crash dump fills partition e in Figure 4-2, the system writes the remaining crash dump information to the end of the primary swap partition. Note that the system fills as much of the primary swap partition as is necessary to store the entire dump. The dump is written to the end of the primary swap partition to attempt to protect it from system swapping. However, the dump can fill the entire primary swap partition and might be corrupted by swapping that occurs as the system reboots.
A partial crash dump contains the following:
The system writes the part of physical memory believed to contain significant information at the time of the system crash. By default, the system omits user page table entries.
A full crash dump contains the following:
If you want the system to include user page tables in partial crash dumps, set the value of the dump-user-pte-pages attribute to 1. The dump-user-pte-pages attribute is in the vm subsystem. The following example shows the command you issue to set this attribute:
# sysconfig -r vm dump-user-pte-pages = 1
To set this console environment variable, shut down and halt your system. At the console prompt, enter the following command:
>>> set boot_osflags dThe boot_osflags variable controls other boot options, such as whether the system boots to single-user mode or multiuser mode; therefore, use care when setting this variable. For more information about boot_osflags, see the System Administration manual.
(dbx) a partial_dump = 0
Because crash dumps are written to the swap partitions on your system, you allow space for crash dumps by adjusting the size of your swap partitions. For information about modifying the size of swap partitions, see the System Administration manual and the Installation Guide.
Be sure to list all swap partitions in the /etc/fstab file. The savecore command, which copies the crash dump from swap partitions to a file, uses the information in the /etc/fstab file to find the swap partitions. If you omit a swap partition from /etc/fstab, the savecore command might be unable to find the omitted partition. Note
The sections that follow give guidelines for estimating the amount of space required for partial and full crash dumps. In addition, setting the dump_sp_threshold attribute is described.
If your swap partitions are too small to store a partial crash dump, the system creates no crash dump. Therefore, overestimate the amount of space you need and adjust the amount of space you allocate to saving crash dumps, if necessary, after your system creates a few crash dumps.
Because crash dumps are about the same size as crash dump files, you can determine how large a crash dump was by examining the size of the resulting crash dump file. For example, to determine how large the first crash dump file created by your system is, issue the following command:
# ls -s /var/adm/crash/vmcore.0 20480 vmcore.0
This command displays the number of 512-byte blocks occupied by the crash dump file. In this case, the file occupies 20,480 blocks, so you know that the crash dump written to the swap partitions also occupied about 20,480 blocks. Be sure to use the ls -s command to display the size of crash dump files. The size that the ls -l command displays is incorrect. The ls -l command includes file "holes" in the size of the crash dump file. (See Section 4.6 for more information.)
In some cases, a system contains so much active memory that it cannot store a crash dump on a single disk. For example, suppose your system contains 2 GB of memory and system activity level is high (uses most of memory). Crash dumps for this system are too large to fit on a single device. To cause crash dumps to spread across multiple disks, set the dump_sp_threshold attribute to a high value, as described in Section 4.3.3, and create secondary swap partitions on several disks. The system automatically writes dumps that are too large to fit in the primary swap partition to secondary swap partitions. The System Administration manual describes configuring swap space.
If your system contains a large amount (2 GB, for example) of memory, it might need to spread crash dumps across multiple disks. To cause crash dumps to spread across multiple disks, set the dump_sp_threshold attribute to a high value, as described in Section 4.3.3, and create secondary swap partitions on several disks. The system automatically writes dumps that are too large to fit in the primary swap partition to secondary swap partitions. The System Administration manual describes configuring swap space.
If you chose to have the system perform a full dump when it crashes and your swap partitions are too small to store a full dump, the system performs a partial dump.
To adjust the dump_sp_threshold attribute, issue the sysconfig command. For example, suppose your primary swap partition is 40 MB. To raise the value so that the system writes crash dumps to secondary partitions, issue the following command:
# sysconfig -r generic dump_sp_threshold=20480In the preceding example, the dump_sp_threshold attribute, which is in the generic subsystem, is set to 20,480 512-byte blocks (40 MB). In this example, the system attempts to leave the entire primary swap partition open for system swapping. The system automatically writes the crash dump to secondary swap partitions and the crash dump header to the end of the primary swap partition.
The sysconfig command changes the value of system attributes
for the currently running kernel. To store the new value of the dump_sp_threshold attribute in the sysconfigtab database, modify that database
using the sysconfigdb command. For information about the sysconfigtab database and the sysconfigdb command, see the System Administration
manual and the sysconfigdb
(8) reference page.
You can invoke the savecore command from the command line.
For information about the command syntax, see the savecore
(8) reference
page.
If a crash dump exists and the file system contains enough space to save the crash dump files, the savecore command moves the crash dump and a copy of the kernel into files in the default crash directory, /var/adm/crash. (You can modify the location of the crash directory, as described in Section 4.5.) The savecore command stores the kernel image in a file named vmunix.n, and it stores the contents of physical memory in a file named vmcore.n.
The n variable specifies the number of the crash. The number of the crash is recorded in the bounds file in the crash directory. After the first crash, the savecore command creates the bounds file and stores the number 1 in it. The command increments that value for each succeeding crash.
You can cause the savecore command to write the reboot message to another file by modifying the auth facility entry in the syslog.conf file. If you remove the auth entry from the syslog.conf file, the savecore command does not save the reboot message.
The savecore command saves the kernel message buffer in the /var/adm/crash/msgbuf.savecore file, by default. You can change the location to which savecore writes the kernel message buffer by modifying the msgbuf.err entry in the /etc/syslog.conf file. If you remove the msgbuf.err entry from the /etc/syslog.conf file, savecore does not save the kernel message buffer.
Later in the reboot process, the syslogd
daemon starts up, reads the contents of the msgbuf.err file, and
moves those contents into the /var/adm/syslog/kern.log file, as
specified in the /etc/syslog.conf file. The syslogd
daemon then deletes the msgbuf.err file. For more information
about how system logging is performed, see the System Administration manual and
the syslogd
(8) reference page.
The savecore command saves the binary event buffer in the /usr/adm/crash/binlogdumpfile file by default. You can change the location to which savecore writes the binary event buffer by modifying the dumpfile entry in the /etc/binlog.conf file. If you remove the dumpfile entry from the /etc/binlog.conf file, savecore does not save the binary event buffer.
Later in the reboot process the binlogd daemon starts up, reads the contents of the /usr/adm/crash/binlogdumpfile file, and moves those contents into the /usr/adm/binary.errlog file, as specified in the /etc/binlog.conf file. The binlogd daemon then deletes the binlogdumpfile file. For
more information about how binary error logging is performed, see the System Administration
manual and the binlogd
(8) reference page.
For example, suppose you save partial crash dumps. Your system has 96 MB of memory, but your peak system activity level is 80 MB. You have reserved 85 MB of disk space for crash dumps and swapping. In this case, you should reserve 91 MB of space in the file system for storing crash dump files. You need to reserve considerably more space if you want to save files from more than one crash dump. If you want to save files from multiple crash dumps, consider compressing older crash dump files. See Section 4.6 for information about compressing and uncompressing partial crash dump files.
By default, savecore writes crash dump files to the /var/adm/crash directory.
To reserve space for crash dump files in the default directory, you must
mount the /var/adm/crash directory on a file system that has a
sufficient amount of disk space. (For information about mounting file systems,
see the System Administration manual and the mount
(8) reference page.) If
you expect your crash dump files to be large, you might need to use a Logical
Storage Manager (LSM) file system to store crash dump files. For information
about creating LSM file systems, see the Logical Storage Manager manual.
# savecore /usr/adm/crash2Once savecore has saved the crash dump files, you can bring your system to multiuser mode.
Specifying a directory on the savecore command line changes the crash directory only for the duration of that command. If the system crashes later and the system startup script invokes the savecore script, savecore copies the crash dump to files in the default directory, which is normally /var/adm/crash.
You can control the default location of the crash directory with the rcmgr command. For example, to save crash dump files in the /usr/adm/crash2 directory by default (at each system startup), issue the following command:
# /usr/sbin/rcmgr set SAVECORE_DIR /usr/adm/crash2
If you want the system to return to multiuser mode, regardless of whether it saved a crash dump, issue the following command:
# /usr/sbin/rcmgr set SAVECORE_FLAGS M
If you compress a vmcore.n dump file from a partial crash dump, you must use care when you uncompress it. Using the uncompress command with no flags results in a vmcore.n file requiring space equal to the size of memory. In other words, the uncompressed file requires the same amount of disk space as a vmcore.n file from a full crash dump.
This situation occurs because the original vmcore.n file contains UNIX File System (UFS) file "holes." UFS files can contain regions, called holes, that have no associated data blocks. When a process, such as the uncompress command, reads from a hole in a file, the file system returns zero-valued data. Thus, memory omitted from the partial dump is added back into the uncompressed vmcore.n file as disk blocks containing all zeros.
To ensure that the uncompressed core file remains at its partial dump size, you must pipe the output from the uncompress command with the -c flag to the dd command with the conv=sparse option. For example, to uncompress a file named vmcore.0.Z, issue the following command:
# uncompress -c vmcore.0.Z | dd of=vmcore.0 conv=sparse 262144+0 records in 262144+0 records out
If your system hangs and you force a crash dump, the panic string recorded in the crash dump is the following:
hardware restartThis panic string is always the one recorded when system operation is interrupted by pressing the Halt button or Ctrl/P.