Debugging hung machines can be a bit tricky. Here I’ll document methods to trigger a crashdump when these hangs occur. What exactly does it mean when a machine ‘hangs’ or ‘freezes-up’? More information can be found in the kernel documentation, but overall there are really a few types of hangs.
A “Soft Lock-Up” is when the kernel loops in kernel mode for a duration without giving tasks a chance to run. A “Hard Lock-Up” is when the kernel loops in kernel mode for a duration without letting other interrupts run. In addition a “Hung Task” is when a userspace task has been blocking for a duration.
Thankfully the kernel has options to panic on these conditions and thus create a proper crashdump. In order to setup crashdump, on an Ubuntu machine we can do the following. First we need to install and setup crashdump, more info can be found here
sudo apt-get install linux-crashdump
Select NO unless you really would like to use kexec for your reboots. Next we need to enable it since by default it is disabled.
sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/' /etc/default/kdump-tools
Reboot to ensure the kernel cmdline options are properly setup
After reboot run the following:
sudo kdump-config show
If this command shows ‘ready to dump’, then we can test a crash to ensure kdump has enough memory and will dump properly. This command will crash your computer, so hopefully you are doing this on a test machine.
echo c | sudo tee /proc/sysrq-trigger
The machine will reboot and you’ll see a crash in /var/crash. All of this is already documented here, so now we need to enable panics for hang and lockup conditions. Now we need to enable crashing on lockups, so we’ll enable many cases at once. Edit /etc/default/grub and change this line to the following:
GRUB_CMDLINE_LINUX="nmi_watchdog=panic hung_task_panic=1 softlockup_panic=1 unknown_nmi_panic"
In addition you could enable these via /proc/sys/kernel or sysctl. For more information about these parameters there is documentation here If you’ve used the command line change, update grub and then reboot.
sudo update-grub sudo reboot
Now your machine should crash when it locks up, and you’ll get a nice crashdump to analyze. If you want to test such a setup I wrote a module that induces a hang to see if this works properly.