String Keys Suck

In my previous post, I mentioned that I needed 95 MB of space memory just to run my simple test of extracting data from the Torchlight 2 PAK file.  I did some investigating to figure out what the heck was going on.  The culprit: using a FilePath (which is just a String) as the key to my map.

Prepping for profiling

To profile in Haskell with GHC, you need to compile your program with the -prof option, and throw in -auto-all to automatically add cost centers to the profile output.  You then execute the program with some additional flag to tell the runtime to collect profiling data.  After that, you can look at the resulting .prof file for nice tabular data, but I prefer graphs.  There are a few annoying steps to this whole process, so I created this batch script to handle most of it for me, which I named hprofile.  It runs the exe, generates the products, and tags the prof and graph with a description.

@echo off
%2.exe %3 %4 %5 %6 %7 %8 %9
hp2ps -e8in -c "%2.hp"
DEL %2.hp
set ID=%~1
set newname=%2(%ID%)
IF EXIST "" (DEL "")
RENAME "" ""
IF EXIST "" (DEL "")
RENAME "" ""
DEL "%2.aux"
CALL ps2pdf ""
DEL ""

Baseline – with String keys

Here’s the graph resulting from

>hprofile "baseline" FNIStash-Debug +RTS -p -hc
Memory usage for String keys in map

Memory usage for String keys in Map

The forText2/pakFileList is the function that generates the keys in the Map.  In this case, the keys are Strings (FilePaths).

Improvement – with Text keys

I changed the type of the key in the map from FilePath to Text.  This actually made a lot of sense since I parsed them out as Text anyway, but chose FilePath before so I could use the path handling utilities in System.FilePath.  The lookup function on the map still takes a FilePath as the key.  Now, however, the FilePath is converted to Text within the lookup function.  Here is the result.

Memory usage for Text keys in Map

Memory usage for Text keys in Map

No more runaway memory usage!  The moral of the story: avoid String.