Busybox silently failing at startup

I had a problem at work recently that drove me bonkers for a long time. The embedded linux product I am working on would start to boot up just fine but then freeze at “Freeing init memory.” For the longest time, I thought the device files for the serial connection were not configured properly, causing the remaining output to get lost. Turns out the problem was a lot more troublesome.

The real problem was that my uclibc dynamic linker .so library was not built/configured properly. I used the “generic” linux headers that were included in the (very old) legacy buildroot distribution for this product. (It’s so old in fact that I believe it predates the name “buildroot.”) The installation scripts said that the generic headers would be fine, but apparently they weren’t, even though the whole toolchain was built successfully. Because the headers were wrong, the ld library caused a silent segmentation fault as soon as it tried to run busybox init. I really learned my lesson that I need to build my toolchain with the exact same headers as the kernel!

Debugging this was a pain. First, I built a static executable to just print a message then sleep for a long time, and set this as my initial executable in the kernel boot arguments. I saw the message just fine, so the serial connection was working, and static inits appeared to work. The problem had to be elsewhere.

I then built busybox statically instead of dynamically and ran its init as the initial executable. That worked just fine, so something was wrong with the dynamic executable.

With the machine booting up with the static busybox exe, I could pull the dynamic exe onto the system via ftp and execute it manually to see what was going on. I also added a few printfs into the init_main busybox function to quickly see where the hang up was. I ran the binary and, ah ha, segmentation fault. But I didn’t get any print outputs to the screen, so the segfault was happening before entering int_main. Weird.

I found some debug compile options for uClibc that enabled dynamic linker debugging, rebuilt, and ran the dynamic busybox init manually using the new linker library. The failure occurred in what looked like a kernel system call. I believed my kernel was fine, so the problem had to be with the library. I chased down the includes and discovered the wrong headers were being used.

It sounds like a very methodical debugging session, but this problem kicked my butt for days. I hope this post prevents similar frustration for someone else.