Busybox silently failing at startup

I had a problem at work recently that drove me bonkers for a long time. The embedded linux product I am working on would start to boot up just fine but then freeze at “Freeing init memory.” For the longest time, I thought the device files for the serial connection were not configured properly, causing the remaining output to get lost. Turns out the problem was a lot more troublesome.

The real problem was that my uclibc dynamic linker .so library was not built/configured properly. I used the “generic” linux headers that were included in the (very old) legacy buildroot distribution for this product. (It’s so old in fact that I believe it predates the name “buildroot.”) The installation scripts said that the generic headers would be fine, but apparently they weren’t, even though the whole toolchain was built successfully. Because the headers were wrong, the ld library caused a silent segmentation fault as soon as it tried to run busybox init. I really learned my lesson that I need to build my toolchain with the exact same headers as the kernel!

Debugging this was a pain. First, I built a static executable to just print a message then sleep for a long time, and set this as my initial executable in the kernel boot arguments. I saw the message just fine, so the serial connection was working, and static inits appeared to work. The problem had to be elsewhere.

I then built busybox statically instead of dynamically and ran its init as the initial executable. That worked just fine, so something was wrong with the dynamic executable.

With the machine booting up with the static busybox exe, I could pull the dynamic exe onto the system via ftp and execute it manually to see what was going on. I also added a few printfs into the init_main busybox function to quickly see where the hang up was. I ran the binary and, ah ha, segmentation fault. But I didn’t get any print outputs to the screen, so the segfault was happening before entering int_main. Weird.

I found some debug compile options for uClibc that enabled dynamic linker debugging, rebuilt, and ran the dynamic busybox init manually using the new linker library. The failure occurred in what looked like a kernel system call. I believed my kernel was fine, so the problem had to be with the library. I chased down the includes and discovered the wrong headers were being used.

It sounds like a very methodical debugging session, but this problem kicked my butt for days. I hope this post prevents similar frustration for someone else.

Haskell desktop GUIs with bindings-cef3

While web technologies and apps “in the cloud” are easily the dominant trend in application programming these days, there are still some situations in which reliable old desktop apps are superior. For example, most of my experience is in the defense industry, and there is no way the cloud would get trusted with sensitive files without appropriate protections. So how can we leverage the efforts and progress of web interface programming on the desktop, especially with regard to Haskell?

One option that I have some experience with is threepenny-gui, which uses the web browser as an interface to a Haskell backend server that runs locally (or over a LAN – really any low latency situation). An early version of threepenny is used in FNIStash. While effective, something just feels wrong to me about running desktop applications inside a browser application. Really what I’d like to do is incorporate browser functionality inside my applications instead of relying on Google, Mozilla, or even Microsoft.

Enter the Chromium Embedded Framework. It’s a fully functioning, bare-bones browser and javascript engine packed inside a .dll or .so file. Add some tabs, a URL bar, bookmarks, etc, and you have a modern browser. Alternatively, you can use CEF to provide HTML and JS driven GUIs for desktop applications. Indeed, the Steam client does this very thing! Adobe uses it as well. In fact, it’s pretty popular.

CEF provides a C API, which we can call from Haskell with appropriate bindings. I cut my teeth recently with the bindings-dsl by defining the bindings to the low level interface of the HDF5 library. Bindings to the CEF library seemed like a good follow on activity, especially since I’d like to use it in my own Haskell projects.

So I created bindings-cef3, Haskell bindings for the CEF3 C API. The package includes an optional example program that creates a browser window and loads a URL. There’s a snapshot below. These bindings are very low level, so there is ample opportunity to wrap them in a smarter, more convenient Haskell API. This is something I plan to do myself eventually, unless a more experienced Haskeller beats me to it!

Beware that because of the multiple interdependent types in CEF3, nearly all of the bindings are provided in a single file. It can take a while to preprocess and compile. When I was working on it, I had up up the RAM on my virtual machine from 4 GB to around 5 GB to avoid cryptic gcc errors. If anyone knows a better way to structure the library to avoid these hurdles, I’d love to hear it.

The library is not currently on Hackage as of this writing, and that is intentional. The current version only supports Linux, and being an occasional Windows Haskell developer myself, I don’t like the idea of keeping out Windows (and MacOS) users. Only a little more coding effort and a decent amount of testing effort is required for other platforms, and I welcome contributions from anyone wanting to help. Once it has been vetted a little more and better compatibility is implemented, joining Hackage with other bindings library seems only natural.

While bindings to CEF3 is just a first step toward having “true” Haskell desktop GUIs driven by web technologies, it’s an important step. More experimentation is needed to uncover quirks and limitations; I’m certainly no CEF expert. In fact, I probably know only as much as is necessary to get the example application working 🙂 . Hopefully this library will help out anyone frustrated with Haskell GUIs like I have been before!

Screenshot of cefcapi example program.

]8 Screenshot of cefcapi example program.

Improving memory performance in Haskell programs

I ran into a big problem recently in FNIStash – I realized that it was using way more memory than I thought it should.  What followed was a couple weeks of arduous investigation and learning about Haskell.  If you find your program has a memory leak that is not due to thunks, these tips might help you out.  Note that this all applies to the de facto standard compiler, GHC.

Understanding memory usage

The first step to figuring out why your program is using a lot of memory is to measure it.  You can enable profiling by compiling your program with the -prof and -rtsopts compilation flags.  This enables profiling and allows you to control it from the RTS options passed from the command line when you run the program.  For example,

MyProgram.exe +RTS -hy

will run the program and categorize memory usage by type.  There are lots of categorization options available, such as by module or function.

When your program terminates, you’ll get a .hp file in the same directory as the executable.  You can convert it to a viewable PS or PDF file using

hp2ps -e8in -c MyProgram.hp 
ps2pdf MyProgram.ps

The result is something like this.

Memory profile of FNIStash categorize by type.

Memory profile of FNIStash categorize by type.

Here we can see that most of the memory is used up by ARR_WORDS, which is a backing type for ByteString and Text, among other things.  Since FNIStash reads and parses an archive of binary files, this makes sense.  However, the magnitude of the plot maxes out at around 25 MB.  Why, then, does Windows report over 70 MB of memory used?

Memory footprint of FNIStash

The discrepancy is due a couple of factors:

  1. First, the profiling doesn’t come for free.  Compiling with the -prof option necessitates more memory just for reporting purposes.  This requires approximately 30% more memory than usual.  This overhead isn’t reported on the plot, but the OS sees it.
  2. The reported memory usage is only “live memory.”  It doesn’t include old data that has not been collected by the garbage collector yet.  In fact, in very loose terms, the default GHC GC waits until live data grows to the size of old data before collecting it.  This means your program might use twice as much memory as needed.  Note that when the garbage collector does run, it temporarily needs more space to reorganize the live data, so if a large amount of data is being collected at once, you can see memory usage (as reported by the OS) spike up.
  3. FNIStash, in particular, has some kind of memory leak.  Right after start up, the OS memory usage was around 66 MB, but crept up to 70 MB after some minor GUI interaction.  It’s actually unclear to me whether the leaked memory is reported in the plot.

So if we discount the leaked memory, then we have 66 MB / 1.3 / 2 = 25 MB, which is right in line with what the graph reports.  Turning off profiling saves an easy 30% of OS memory usage, but what if you want to save more?  How can the memory leak be fixed?  Are you stuck with using twice as much memory as necessary?

Fixing the leak

Fixing the memory leak was a Sisyphean process of changing data types, adding strictness, and other futile efforts until I found the magic bullet: the threaded run time system.  By default, Haskell uses a single-threaded RTS (meaning 1 OS thread) that can nonetheless provide concurrency.  Indeed, FNIStash uses multiple threads to run the frontend and backend separately.  The -threaded flag enables the multi-threaded RTS (multiple OS threads), and using it completely obliterated the terrible memory creep I was seeing in the Task Manager.  I don’t know why this is.  My suspicion is that the idle-time GC that comes with -threaded more aggressively cleans up.  Given the number of cores on modern machines, I plan to make this my standard in the future.

Tuning the garbage collector

GHC enables the user to tune the garbage collector from the command line using RTS arguments.  Here are the ones that I tested out.  To use them, your program must be compiled with the -rtsopts flag.  Additionally, and this is totally anecdotal on my part, it seems that having -prof enabled at the same time prevents any of these options from having an effect, so if you notice this you might want to compile without -prof.

  • -c : This option changes the GC algorithm to use a slower compacting strategy instead of a copying strategy.  What I think this mean is that instead of copying bytes around the heap when running GC, data gets compacted near where it already is.  The end result is less heap usage.  In my case, using this option yielded savings of around 10%.
  • -Ffactor : Here “factor” is a value.  By default, GHC uses a value of 2.  This is the growth factor allowed before GC is run.  If you make this smaller, GC will be run more often.  However, this didn’t seem to help me out much.
  • -Gx : If x = 1, this enables the single generation collector.  I used this option without any others, and it actually increased my memory usage considerably and made my program much slower.  I also tried G3, but that used a lot more memory as well.

In the end, I decided that killing the leak with the threaded RTS was enough for me.  I don’t need extra memory savings by using -c for the moment.  So I went from 70+ MB with a leak down to 48 MB with no leak by using -threaded and removing -prof.  The real benefit, however, was learning more about GHC’s runtime system.

Sprite Clipper v0.92.0 released!

A new version of Sprite Clipper is out there!  v0.92.0 includes a center reshape anchor point, but the most significant new feature is simple drag-select functionality, so you don’t need to click on individual sprite clips if you don’t want to.  Check it out when you get a chance, and keep that feedback coming!