Crashdump Debugging
From openSUSE
Contents |
Basics
Saving the dump is the first step. You or a developer that tries to debug your problem needs a program to read the dump. GDB can be used to perform this, however, it has several disadvantages:
- GDB was written to debug userspace programs, not kernel dumps. It lacks features required to debug kernel dumps like a “task context” or accessing some widely used kernel structures in an efficient way.
- GDB lacks support for ELF64 on 32 bit platforms. Because of PAE, a 32-bit system may have more than 232 bytes of memory even on 32 bit platforms.
Because of this, and because of the inability to read other formats than ELF dumps (for example, the compressed format from makedumpfile, kernel dumps produced by LKCD or Xen dumps), the crash utility has been implemented. It can be downloaded from [1] and is also included in openSUSE. The latest version is available in the openSUSE Buildservice in the Kernel:kdump repository.
Debuginfo Packages
Not only for the kernel but for every userspace program, the program must be compiled with debugging information (the -g flag of gcc). Because debugging information makes the binaries large, most Linux distributions use following approach:
- All programs are compiled with the -g flag to include debugging information.
- The debugging sections are stripped out from the binary.
- Instead, a so called debug link is included in the main binary. That debug link contains the name of the file that contains the debug information (usually just binary.debug, so the debuginfo for the ls program is in ls.debug).
- The debug information of the binary is saved in an extra file in the /usr/lib/debug directory (plus the path for the binary, so the debug information for /bin/ls is in /usr/lib/debug/bin/ls.debug).
- The package that contains the debug information is called package-debuginfo.
So, to use GDB for a userspace program or crash for the kernel, you need to install the debuginfo package. If you use kernel-default on your system (just check with rpm -qa *kernel*), then the package to install is kernel-default-debuginfo.
The debuginfo packages are usually not in the “main” repository but a special “debuginfo” repository. You should add the “debuginfo” repository to YaST or zypper first. For openSUSE 11.0, just open the “YaST Software Repositories” module, choose Add, Community Repositories and then Main Repository (DEBUG).
Debuginfo Links
Most users are confused how gdb and crash find the debuginfo file of binaries. The mechanism is documented in GDB manual, section 15.2 (offline accessible with info gdb on your system).
SUSE uses the .gnu_debuglink mechanism. So the executable contains a section called .gnu_debuglink that contains
- the name of the file that contains the debug information (like vmlinux.debug) and
- a CRC32 checksum to the debuginfo file.
The SUSE kdump package (openSUSE 11.0 only) contains a program read-debuglink that can be used to print that link. In default mode, it automatically searches for the debuginfo file using the same algorithm like GDB, so if no debuginfo package is installed, the program prints “No debug information found.”. When using with the -l parameter (“link only”), the program prints the name of the binary like described above.
To sum it up, when a program /bin/ls contains a debuginfo link called ls.debug, then GDB searches in following places for the ls.debug file:
- /bin/ls.debug
- /bin/.debug/ls.debug
- /usr/lib/debug/bin/ls.debug
SUSE debuginfo packages write their debuginfo files to /usr/lib/debug.
Kernel Binary Formats
To understand how crash must be invoked, you might take a look at the different kernel binary formats and their naming inside SUSE packages. The kernel like every other userspace program comes in Executable and Linkable Format (ELF). This file is usually called vmlinux and directly generated in the compilation process.
However, not all bootloaders, especially on the x86 (i386 and x86_64) architecture support loading ELF binaries. Because of this, following solutions exist on the different architectures that are supported by openSUSE and SUSE LINUX Enterprise.
x86 (i386 and x86_64)
Mostly because of historic reasons (for example the Linux kernel is “self-executable” on a floppy disk without any bootloader!), the Linux kernel consits of two parts:
- the Linux kernel itself (vmlinux) and
- the setup code (arch/x86/boot/) that is run by the boot loader (i.e. GRUB, lilo or something else) in real mode and that loads the Linux kernel.
That two parts are linked together in a file called bzImage in the kernel source tree. That file is now called vmlinuz (note z vs. x) in the kernel package.
Because the ELF image is never directly used on x86, the main kernel package contains the vmlinux file in compressed form, called vmlinux.gz.
To sum it up, an x86 SUSE kernel package has two kernel files:
- vmlinuz that is executed by the bootloader
- vmlinux.gz, the compressed ELF image that is required by crash and GDB (uncompressed, see later).
IA64
Because the elilo bootloader that is used to boot the Linux kernel on the IA64 architecture supports loading ELF images out of the box and because that bootloader even supported loading compressed ELF images out of the box, an IA64 kernel package contains only one file called vmlinuz that is in fact a compressed ELF image. So vmlinuz on IA64 is the same as vmlinux.gz on x86!
PPC and PPC64
The yaboot yaboot boot loader on PPC also supports loading ELF images like IA64, but no compressed ones. So in an PPC kernel package, there's a file vmlinux that is just the ELF Linux kernel. From crash debugging point of view, that's the easiest architecture.
Opening Kernel Dumps
Two scenarios are possible when doing crashdump debugging:
- having anything installed on the system that produced the core dump,
- using another machine for crashdump debugging.
... in the system where the kernel is installed
Using the system (or a system with the same software versions) that generated the crash dumps for debugging is easier but not always what you want in practise (consider a large production server that produced a crashdump, I guess you don't want to debug everything on that large production server).
Two requirements are necessary:
- The kernel-<flavour>-debuginfo package must be installed. The version must absolutely the same as the kernel package that produced the crashdump. If you are not sure about the kernel version that produced the dump, run /sbin/get_kernel_version on the vmcore file.
- The crash package must be installed.
WARNING: Compressed kernel images (gzip, not the bzImage x86 stuff) are supported with SUSE packages of crash starting from openSUSE 11.0. For older versions, you have to unextract the vmlinux.gz (x86) or the vmlinuz (IA64) to vmlinux).
Now to open the crashdump with crash, just use
$ cd /path/to/vmcore # x86_64 and i386: $ crash /boot/vmlinux-<version;>.gz vmcore # IA64 $ crash /boot/vmlinuz-<version;> vmcore # PPC64 $ crash /boot/vmlinux-<version;> vmcore
... on another system
At first, crash has not (yet) support for cross debugging. This means that it's required that the system where the dump should be debugged must have the same architecture. For i386/x86_64, it's possible to install a i386 crash package also on x86_64 (but for this you need to uninstall the 64 bit package) since the crash program has no library dependencies other than glibc and zlib.
You need following things available on the system where you want to debug the crash:
- the kernel-<flavour> and kernel-<flavour>-debuginfo packages from the system where the crash happened,
- crash must be installed and
- the vmcore (i.e. the dump itself).
Now copy everything in one directory and then unextract the RPMs:
$ cd /home/john/crashdump/2008-05-12
$ rpm2cpio kernel-<flavour>.rpm | cpio --extract --unconditional \
--preserve-modification-time --make-directories
$ rpm2cpio kernel-<flavour>-debuginfo.rpm | cpio --extract \
--unconditional --preserve-modification-time --make-directories
Because crash doesn't find the debuginfo then, just symlink
$ ln -s boot/vmlinux-<version>-<flavour>.gz . $ ln -s usr/lib/debug/boot/vmlinux-<version>-<flavour>.debug .
And now invoke crash:
$ crash vmlinux-<version>-<flavour>.gz vmcore
The same restrictions apply for compressed kernel images, i.e. for older versions of crash, you need to uncompress before manually.
Rebuilding the Kernel
Everything above required that you have the debuginfo package for exactly the same binary of the kernel that produced the crash. However, it's not always the case. For example, we provide no debuginfo packages for maintenance kernels. Even if you just rebuild the kernel again with the identical source, this will not work.
However, if the kernels are “similar” (which means the same version, ideally the identical source code as used for the kernel that crashed), following is possible: Rebuild the kernel (you have to turn on generation of debug information in the .config file by enabling CONFIG_DEBUG_INFO). Then use the vmlinux binary of the kernel in the top level directory.
Now it's sufficient if you have the System.map file of the kernel that crashed. That file is in the SUSE kernel package. Now invoke crash with
$ crash System.map-<version>-<flavour> vmlinux vmcore
if that doesn't work (i.e. for example bt produces garbage), then you're out of luck. It's unfortunately not guaranteed to work.
Using crash
Since using crash is not SUSE specific at all, just read the Crash whitepaper from the upstream author of crash.
Tips & Tricks
Kernel Command Line
To get the kernel command line in case it's not in the kernel log buffer any more, use
crash> p saved_command_line

