# Improving memory performance in Haskell programs

I ran into a big problem recently in FNIStash – I realized that it was using way more memory than I thought it should.  What followed was a couple weeks of arduous investigation and learning about Haskell.  If you find your program has a memory leak that is not due to thunks, these tips might help you out.  Note that this all applies to the de facto standard compiler, GHC.

## Understanding memory usage

The first step to figuring out why your program is using a lot of memory is to measure it.  You can enable profiling by compiling your program with the -prof and -rtsopts compilation flags.  This enables profiling and allows you to control it from the RTS options passed from the command line when you run the program.  For example,

MyProgram.exe +RTS -hy

will run the program and categorize memory usage by type.  There are lots of categorization options available, such as by module or function.

When your program terminates, you’ll get a .hp file in the same directory as the executable.  You can convert it to a viewable PS or PDF file using

hp2ps -e8in -c MyProgram.hp
ps2pdf MyProgram.ps

The result is something like this.

Memory profile of FNIStash categorize by type.

Here we can see that most of the memory is used up by ARR_WORDS, which is a backing type for ByteString and Text, among other things.  Since FNIStash reads and parses an archive of binary files, this makes sense.  However, the magnitude of the plot maxes out at around 25 MB.  Why, then, does Windows report over 70 MB of memory used?

The discrepancy is due a couple of factors:

1. First, the profiling doesn’t come for free.  Compiling with the -prof option necessitates more memory just for reporting purposes.  This requires approximately 30% more memory than usual.  This overhead isn’t reported on the plot, but the OS sees it.
2. The reported memory usage is only “live memory.”  It doesn’t include old data that has not been collected by the garbage collector yet.  In fact, in very loose terms, the default GHC GC waits until live data grows to the size of old data before collecting it.  This means your program might use twice as much memory as needed.  Note that when the garbage collector does run, it temporarily needs more space to reorganize the live data, so if a large amount of data is being collected at once, you can see memory usage (as reported by the OS) spike up.
3. FNIStash, in particular, has some kind of memory leak.  Right after start up, the OS memory usage was around 66 MB, but crept up to 70 MB after some minor GUI interaction.  It’s actually unclear to me whether the leaked memory is reported in the plot.

So if we discount the leaked memory, then we have 66 MB / 1.3 / 2 = 25 MB, which is right in line with what the graph reports.  Turning off profiling saves an easy 30% of OS memory usage, but what if you want to save more?  How can the memory leak be fixed?  Are you stuck with using twice as much memory as necessary?