SDB:Debugging boot hang

Jump to: navigation, search


Situation

You try to boot a system but it hangs while booting.

Procedure

Initcall debug

In grub, edit your kernel command line, remove both quiet and splash=whatever options and replace them with debug and initcall_debug.

Then capture the output somehow:

  • Either use some console -- serial or netconsole (if network is already available at the point of hang)
  • As a last resort you can use your camera

If there is a crash that can be seen now, create a bug in [bugzilla.opensuse.org] with this log.

Otherwise continue with the next section noting the last few initcall-called without initcall-returned messages. In other words you are looking for module init functions which didn't return at the point of hang.

Example

The grub shows:

kernel /boot/vmlinuz-2.6.150 root=/dev/rootdisk quiet other_option splash=silent

You change it to:

kernel /boot/vmlinuz-2.6.150 root=/dev/rootdisk other_option debug initcall_debug

Then the log shows something like:

calling  e1000_init_module+0x0/0x82 [e1000] @ 338
e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI 
e1000: Copyright (c) 1999-2006 Intel Corporation.
calling  parport_pc_init+0x0/0xa5 [parport_pc] @ 330
calling  ppdev_init+0x0/0xc8 [ppdev] @ 374
initcall parport_pc_init+0x0/0xa5 [parport_pc] returned 0 after 91246 usecs
ppdev: user-space parallel port driver
initcall ppdev_init+0x0/0xc8 [ppdev] returned 0 after 281605 usecs
e1000 0000:00:03.0: eth0: (PCI:33MHz:32-bit) 52:54:00:12:34:56

There we can see that the kernel called e1000_init_module from e1000 module, parport_pc_init from parport_pc and ppdev_init from ppdev. The latter two instantly returned zero (no error), but e1000_init_module didn't return.

So we note that e1000's module load function doesn't return and continue with the next section.

bash as init

Similar to the previous section, edit grub entry to boot with bash as your init. This time, it is enough to add init=/bin/bash.

In many cases this will help to track the problem down, because only the core drivers are loaded (by initrd).

When you boot with bash as init, the system offers you a shell. There are several steps to check:

  • Try to check whether the failing module really hangs the system

From the previous Example, you will just modprobe e1000 and see what happens.