tagline: From openSUSE
Linux Kernel Debugging Introduction
Diagnosing a Kernel bug is a difficult task but some simple categories of bugs that might be Kernel bugs include:
- Hardware driver issues - missing drivers, missing features, etc
- File system corruption - data corruption, missing files, etc
- Hard locks - system lockups, or freezes see SysRq Key
- Kernel oopses - printed in dmesg or /var/log/messages
- Performance issues or latency
This document is intended for:
- supporters and consultants, so they know how to provide useful and valuable bug reports.
- developers, who need to know about debugging facilities provided by the Linux kernel
A Classification of Kernel Bugs
Kernel bugs can be categorized in a number of different schemes (e.g. by their impact on security, data integrity, etc). As this document is about debugging, let's try to categorize them by complexity.
- Not working as expected
Some kernel bugs are quite easy to localize and even reproduce, and do not affect the stability of the rest of the system. For instance, a network card may not work in certain configurations, or NFS produces weird errors.
These are often fairly easy to debug. All it takes is a precise description of what happens, and detailed description how to reproduce the problem. And of course a kernel hacker who has a good understanding of the subsystem :)
- Kernel Oopses
If the kernel runs into an unexpected error condition (e.g. when dereferencing an invalid pointer), an exception handler will catch this error, abort the current process, and try to log a message describing the exception in some detail. Normally, the oops message is logged to the system log. It is also possible to display the message on the text console, but this is turned off by default and needs to be enabled manually.
As this log message starts with the words "Kernel Ooops", the entire event it commonly referred to as "oopsing".
When an oops occurs, the process that caused it was usually in the middle of some dreadfully important business inside the kernel, and was holding one or more locks. As the process is terminated immediately, none of these locks will be released, so that other processes trying to claim these locks later will hang forever.
If the user is in an X Windows session while the oops occurs, the machine may appear to be hung completely, because the X server got stuck on some stale kernel lock and does not react anymore.
- Kernel panics
If an oops occurs inside an interrupt handler, the kernel will try to deliver the oops message, and then halt completely, because there is no sane way to recover after an interrupt handler crashed. This is called a kernel panic.
In the case of a kernel panic, the oops will not be written to syslog, simply because the syslog daemon will not be scheduled anymore.
- Soft lockups
Some kernel bugs do not trigger oopses, but simply freeze the machine. They can be caused by deadlocks or livelocks, among other things. In most cases (unless some stupid bug causes an interrupt handler to spin on a lock), these soft lockups will not prevent delivery of interrupts.
If interrupt delivery is still possible, the machine will react to pings, and keyboard input will be echoed on the text console. However, processes will not make any progress anymore.
Still, the machine will be mostly unresponsive, because processes do not make any progress anymore.
A good way to test for this is to hit the numlock or capslock key; if the keyboard LEDs are turned on and off, you have a deadlock somewhere.
Also, a 2.4 kernel will start flashing the numlock LED all by itself if it got hung. 2.6 doesn't have this feature yet.
- Hard lockups
A hard lockup may occur as well; these are usually due to hardware problems (or excessive abuse of the hardware by some poorly written driver). If this happens, you're in trouble. You can stop reading now, and we wish you good luck in debugging this.
Getting the right Information
Users have a tendency to blame software problems on that part of the system that seems to be the most complex or mysterious from their point of view. Which in most cases is the kernel.
That doesn't mean they have to be wrong. But it often means that their bug reports are inaccurate, or omit important details.
So when you're tasked with debugging a Linux kernel bug (or what someone thinks is a Linux kernel bug) there is a number of questions on the symptoms you should ask first:
- Hangs versus crashes
Is the machine hanging? Did it crash? Did it just appear to have crashed?
Even experienced users will not always remember to check the syslog for oops messages, especially if the kernel just behaves "slightly strange" without crashing and burning spectaculary. You can save yourself a lot of work if you ask them to check the syslog for oops messages nevertheless.
As described above, if the kernel oopsed while user was using X, it may appear to him as if the machine was hanging.
On a 2.4 kernel, there is one indication that will tell you that the kernel paniced, even if you're in X: the numlock LED will start flashing. 2.6 doesn't have this feature yet.
In this case, it always helps to reproduce the problem after changing to the text console. If the machine isn't hung hard, the kernel will at least accept keyboard input. On the console, it will also be possible to capture additional information on the crash. As a minimum, you should tell the kernel to display the oops message on the console as well, using
# klogconsole -r0 -l8
- If the oops isn't written to the syslog (e.g. when the oops occurs inside an interrupt), capturing the output with a digital camera may still help (but please make sure that any images you attach to a bug report don't exceed 512k).
Alternatively, one can try to capture the oops via a serial console.
In addition, you may want to enable the sysrq key and capture some sysrq information, as described in section "Capturing sysrq information" below.
- Eliminate well-known problems
There are certain classes of problems that are common as dirt. Be aware of these problem areas and try to get these out of the equation early on.
- Item Number One on the list of annoying issues is probably ACPI:
- Most of the time when a user reports a problem with a machine not booting properly, or hardware not getting set up correctly, this is caused by bad ACPI BIOS tables.
- Try to boot with acpi=off to turn off ACPI entirely.
- [list of all ACPI related kernel command line variables goes here]
- Other Very Common Problems?
- Eliminate non-essential variables
User bug reports often describe fairly specific scenarios, such as "I am using a USB disk with reiserfs on it exported via NFS while listening to my mp3s and all of a sudden the machine crashes".
This is a nice and accurate report involving almost every subsystem the kernel has (block device layer, VM, VFS, network, sound, ...)
To help you narrow down the problem, here is a bunch of things you can try:
- does the problem exist with older/newer kernel versions as well?
- if you take component X out of the equation, can the bug still be reproduced?
- if you exchange component X for another, equivalent component (e.g. replacing reiserfs with ext3), does the problem persist?
- if the problem implicates memory corruption or random hardware failure: can the problem be reproduced on a different machine?
- Especially on large machines, random memory corruption could be caused by hardware problems with bad RAM. To diagnose bad RAM, use the installation CD, select memtest86 and run it for 24 hours.
Capturing Oops info
As described previously, a kernel crash will result in an Oops that gets written to the system log. The oops messages contain a great deal of information that can help you at least diagnosing the problem.
In (relatively speaking) benign cases, the kernel will be able to continue despite the error condition, and the system will be stable enough to at least write the oops into the system log. If the kernel paniced, it is far from trivial to record an oops, however.
Your worst enemy is the desktop. Any kernel messages printed to the console while the X server is running will not show up on the screen. The X server has a way of capturing messages printed on /dev/console and display them to you, but if the bug is bad enough to prevent syslogd and klogd from writing the oops to the syslog, the chances of X actually being able to display anything useful to you are very very small.
So if you're able to reproduce the problem in some way, the first thing you should do is switch to a text console and bump the console logging info:
klogconsole -r0 -l8
This will switch the kernel's console log level to display anything it sends to syslogd on the virtual console as well. This includes any kernel oopses; if you trigger the kernel bug now, you will at least get a screenful of oops information.
Note: if you're doing this frequently, please refer to the section on serial consoles below - this is really the preferred method, but as a first stab, just being able to read the oops is a major win. To learn how to read an oops, please refer to the file oops-reading.
Starting with openSUSE 12.2 pre-releases, kdump has the ability to dump quickly and only capture the oops. YaST should eventually be updated to configure this automatically, but for now the following instructions should be used.
- Install the yast2-kdump and kdump packages.
- Use yast2-kdump to enable kdump
- Edit /etc/sysconfig/kdump to change $KDUMP_DUMPFORMAT to "none" -- this will disable dumps in general and only save the Oops.
- Reboot to enable the crashkernel zone in the kernel. Your system will operate normally until you reboot but will not capture Oopses.
- Please note that enabling dump uses 128M of RAM except on large systems (more than 512GB RAM) where it must use more to accommodate larger page tables. Once you have reported your issue with the Oops generated, you may want to disable it again.
Note: Configuration via YaST doesn't currently work due to bnc#773143
- You will need to manually add "crashkernel=256M-:128M" to your grub configuration and reboot instead of using YaST.
Now attempt to reproduce your problem. Your system should reboot nearly immediately when it occurs after dropping to a text console. When you reboot, there should be a log file in /var/log/dump/
- This should not be needed for any openSUSE release, but may be useful for self-built kernels.
A kernel oops usually includes a dump of the current processor state, including registers, the instruction pointers, and a function call back trace. For this to be of any use to the kernel developer, these addresses must be mapped to function and/or variable names, if possible.
Current SUSE kernels support a feature called "kallsyms" where the running kernel includes a symbol table of itself, which allows it to resolve the addresses automatically when printing an oops.
Older kernels do not have this feature, so the oops printed will contain just the raw addresses, which need to be converted by a user space application.
This is what ksymoops is for: you can feed ksymoops a raw oops on standard input, and given the right symbol information, it will provide you with a cooked version of that oops that maps all the symbols and a disassembly listing of the hex instructions. It should not be used when kallsyms is turned on (which holds true for opensuse kernels).
The crux of the matter is providing ksymoops with the right symbol information. This information is usually taken from the vmlinux image and the System.map file in /boot, which must exactly match the version of the kernel that generated the oops. Therefore, it is usually a good idea to run ksymoops on the machine where the crash happened.
If it is to resolve module symbols properly, ksymoops also needs the list of kernel modules and their location in memory. A good way of providing this is to copy the file /proc/modules immediately before or after the oops occurred, and specify this copy on the ksymoops command line using the -l option:
# ksymoops -l /tmp/proc-modules-copy < /tmp/my-oops
Fortunately, much of this work is already done automatically when the oops is captured by syslogd, because syslogd will do all the symbols translations for you. This has the great advantage that it always uses the correct symbol list.
Of course, oopses captured via the serial console will not have their addresses massaged by syslogd, so you will have to run ksymoops manually on this case.
sysrq means "system request". This is the name for a bunch of magic key combinations that will tell the kernel to display various types of internal information, sync the file system or kill a task. Since this is somewhat security sensitive (esp. the task killing part), the sysrq keyboard commands are disabled by default for security reasons.
One way to enable sysrq is to execute the following command at the shell prompt:
echo 1 > /proc/sys/kernel/sysrq
In addition, you may want to edit /etc/sysconfig/sysctl and change the variable ENABLE_SYSRQ to "yes". This will ensure that sysrq is enabled after reboot.
To use sysrq, you need to press a "magic" key combination plus a command key. This magic key combination depends on the hardware platform, but on most platforms it's usually ALT-SysRQ (on some keyboards, the SysRQ key is labelled "PrtScr" or "Print", it's usually located to the right of the function keys).
Most sysrq keys will cause the kernel to report status information to the serial console. In the default configuration, a SUSE system has all kernel generated output redirected to tty10, so you need to switch to console 10 or redirect the kernel console to a different tty using klogconsole.
The most helpful command key is "h", which displays a short help text:
SysRq : HELP : loglevel0-8 reBoot tErm kIll saK showMem powerOff showPc unRaw Sync showTasks Unmount
Here's a description of the most important commands:
0-8 These keys change the console log level to the indicated level. 8 will display everything on the console, 1 will be critical messages only, and 0 turns console logging off entirely. M Display current memory statistics P Display current processor registers, instruction counter, call trace and list of loaded modules. This is essentially the process related information that would get printed as part of an oops. T Shows a listing of all tasks, including the back trace of their current kernel stack. Beware, this list can be very long. U Try to re-mount all currently mounted file systems read-only. E Send a TERM signal to all processes except init. I Send a KILL signal to all processes except init.
There are a number of other sysrq keys; a complete list is available from Documentation/sysrq.txt in the kernel source.
It is also possible to trigger sysrq commands from the command line, which is very useful if you do not have keyboard access (e.g. when debugging a problem remotely). In this case, simply echo the letter to /proc/sysrq-trigger and read back the information from dmesg or the syslog files:
# echo t > /proc/sysrq-trigger
Using the serial console
Some kernel oops (especially during boot-up) might occur when the system console unusable. To get a reliable dump report a serial console helps in this case. You'll need a second machine and a null-modem cable (ie a serial cable with two identical connectors); additionally both machines have to have a serial port on them. This leaves some modern machines out of the equation, I'm afraid; you'll have to try to use netconsole on them.
Once you've hooked the cable up you should add 'console=ttyS0,115200 console=tty0' to the command-line of the debuggee. This causes all console message to be sent to ttyS0 as well as well as the standard console. The last console= parameter determines where the console input should be handled from; so if you want to use the serial console to accept input also you'll have to exchange those parameters.
On the receiving machine you just fire up (as root) screen with the command:
# screen -T vt100 /dev/ttyS0 115200
(Assuming that the serial connection is hooked onto the first serial port). The do a reboot and watch.
To capture any oops it's easiest to enable logging from screen, see the screen manual on how to do that.
What should I do with a Kernel OOPS?
See the separate OOPS reading document.
How can I debug a kernel problem?
See the separate Kernel Debugging Introduction document.
Where to report results of your debugging?
If you are lucky enough and arrive at the patch, be so kind as to report the defect via Bugzilla to us. Remember to select component Kernel for optimal routing of your bug report within SUSE.
Even if you weren't able to produce a fix for the bug, report at least information you have collected so far. That will help us take it further.