25 July 2018

Slow memory allocation due to Transparent Huge Pages (THP)

Keywords: Linux; unusual long runtime; large contiguous memory allocation; RAM fragmentation; Transparent Huge Pages.

Executive summary (aka abstract aka TL;DR):

On the stock kernel used at least in Ubuntu 14.04, turn off Transparent Huge Pages on machines with a decent uptime and larger amounts of memory (>= 32 GiB).
This can result in a speedup of 60x - 100x for programs which need lots of memory in large contiguous chunks. Whether the THP kernel routines have been sufficiently improved in later Linux distributions remains to be seen.

Longer version.

For a number of tasks at work, some bioinformatics programs I use need quite a bit of memory. Lots of it actually. The machine I am using has 512 GiB but, as frequently seen in production environments, the OS I use is a bit older: a Ubuntu 14.04 LTS with a 4.2.x kernel .

Symptoms

I noticed a very unusual, non-linear increase of run time for some programs, like, e.g., multiple hours instead of (expected) minutes, and started to look for the cause of this performance issues. After a while I suspected the memory allocation of Linux being responsible.

A small test program

The following C++ program, once compiled, allocates 160 GiB of RAM in one contiguous chunk, initialises it to all zero and then returns:

#include <vector>
#include <cstdint>

int main(int argc, char **argv) {
  std::vector<uint8_t> v(1073741824LL * 160, 0);
  return v[1234];
}

So far so innocuous as allocating roughly 1/3rd of free available RAM should be a no-brainer. However, the behaviour of that program was -- on the otherwise empty machine from above with 190 days uptime and no swap -- really odd:
  1. The virtual memory allocation part took, as expected, just a couple of microseconds: the VIRT entry in the top-program showed the expected 160 GiB right after program start.
  2. Within ~20 seconds, RES entry in top climbed to ~50 GiB. This showed the progress in zeroing out the memory while - at the same time - really committing RAM pages to the process. This, too, was within expected boundaries.
  3. But after these 20s, it took more than 1 hour(!) to have the RES entry climb to the full 160 GiB and the program finally exit.
That felt very wrong.

Cause and cure

Poking around the internet, the suspicion quickly fell on a memory allocation mechanism which is called Transparent Huge Pages, you can read more about it on the Kernel THP documentation pages. The symptoms of the THP algorithm not performing "as expected" can be multiple, ranging from lags in memory allocation of several seconds or even minutes (like in my case) to outright system freezes and reboots. Some recounts and recommendations can be found at MemSQL, ElasticSearch, NuoDB, Couchbase, Oracle and many more.

THP is indeed turned on by default on Ubuntu 14.04 and a couple of other distributions. So, I turned it off (as root) via

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

and, lo and behold, the small test program from above then finished within just a minute.


Probable underlying cause: memory fragmentation


At least, it was in in my case. With the knowledge from the articles cited above, I rebooted the server as I suspected memory fragmentation to be the root cause of the THP problem. The expectation being that a fresh system would have no memory fragmentation. Lo and behold, even with THP on, after the reboot the memory allocation program from above finished in about 30 seconds.

Conclusions

For users / system administrators: either turn off THP or, if THP is absolutely needed, reboot the machine regularly to overcome memory fragmentation.
For authors of (bioinformatics or other) software which need huge contiguous chunks of RAM: at the start of your program, check for the THP feature and warn the user.

BaCh, we're done here.