NuMALLOC_DEBUG

The MALLOC_DEBUG mechanism has been expanded to provide more functionality and in general be more useful to software developers. It retains backward-compatibility with the old MALLOC_DEBUG mechanism, but introduces the concept of levels of MALLOC_DEBUGging to give you access to more features (and, consequently, more runtime overhead) if you turn them on.

Debugging levels

You can enable different levels of memory debugging by assigning a number to the MALLOC_DEBUG environment variable instead of just setting it to "true". The debugging levels currently defined are as follows:
1
This is the same as the previous MALLOC_DEBUG functionality. If you use the old-style MALLOC_DEBUG=true definition, it will set the debugging level to 1 by default. You pay a memory penalty of 52 extra bytes per block for storing padding and debugging information, and a very small speed penalty for maintaining the extra information on block allocation and reallocation, and checking the information when the block is freed.

This level of debugging adds padding to the beginning and end of each block of memory to check, fills the block with garbage when initialized and after it is freed, and checks for the following errors, all of which are detected when (or if) the block is freed:

  • The block has been freed twice.
  • The program has written off the end of the block.
  • The program has written before the beginning of the block. (This may show up as a "bogus size" error.)
5
In addition to all of the checks that are performed for lower debugging levels, take an additional step to detect blocks that are written to after being freed. When a block is freed, instead of calling free() on it immediately, the block is filled with garbage and added to a "free list" where it will remain for some period of time. When the list becomes full, recent blocks push old blocks off of the list; the old blocks are checked to see they have been written to since being placed on the list, and are freed.

This list is fixed by number of blocks, and is not segregated by block size. You can adjust the size of the list through the environment variable MALLOC_DEBUG_FREE_LIST_SIZE; it is set to 1000 blocks by default.

The runtime speed penalty for this debugging option is minimal, as the allocator must perform minimal extra processing to maintain the list of freed blocks. The memory penalty will depend on the size of blocks that are present in the list and the amount of turnover in block allocation, but can be tuned by adjusting MALLOC_DEBUG_FREE_LIST_SIZE. It is recommended that this environment variable be kept to a reasonably high value so as to give your application as long a time as possible to write to the block before it will no longer be detected (or, worse, the application will write into memory now used by a different, freshly-allocated block).

There is something to be aware of when you are using this option: If your program that gets an error for a block that has been written to after being freed, then the call stack that leads to the memory call that detected the error isn't meaningful, since it is dealing with a different block than the one with the error. You need to look at the offending block itself and the call stack that led to its allocation; that information is present in the error message the debugger displays when invoked.

10
In addition to all of the checks that are performed for lower debugging levels, occasionally perform those checks for all blocks in the heap and all blocks in the free list, so you can catch errors before the blocks are freed (in the case of heap blocks) or they fall off the old end of the free list (in the case of freed blocks). This will let you catch memory errors much closer to the time that they actually happen. These consistency checks are performed every time that malloc(), realloc(), or free() are called.

The runtime speed penalty for this option can be considerable, especially if your program allocates a large number of blocks or if your MALLOC_DEBUG_FREE_LIST_SIZE environment variable is set to a large value. For this reason, the MALLOC_DEBUG_CHECK_FREQUENCY environment variable is used to control the frequency of these consistency checks. It is defined by the number of malloc/realloc/free calls that are made in between consistency checks, and defaults to 1000. If the value of this environment variable is 1, MALLOC_DEBUG will check with every call to malloc/realloc/free; if the value is 2, it will check every other call; 1000, it will check every thousandth call, and so on.

It is recommended that for normal debugging, if you use level 10 debugging or greater, that you keep this setting to a relatively high value so as not to impact your runtime performance greatly, but to still give you the advantage of checking blocks that are infrequently (or not at all) freed. If you run into memory problems and are having difficulty pinpointing when and where they occur, you can set this value lower to provide a stricter environment.

Call stack of block allocation

MALLOC_DEBUG now records the last seven levels of call stack in the malloc() call for each block, so if an error is detected in the block later on, MALLOC_DEBUG will print the call stack in the debugger's error message, latest call first, as a set of seven hexadecimal return addresses. You can turn the addresses by looking them up in the link map for your program or by using the wh command in the debugger.

The call stack is not guaranteed to be accurate; there may be some bogus return addresses in the listing. However, there should be enough information in there for you to determine where the offending block was allocated, and hopefully, what it is and what to do about it.

This call stack feature is always available when MALLOC_DEBUG is turned off; you cannot disable it to free up the memory (28 bytes per block) that the call stack consumes.

Memory overhead

As was previously stated, MALLOC_DEBUG allocates an extra 52 bytes per block to store its debugging information. Currently, none of the debugging levels incur any extra per-block storage overhead though they may consume extra memory in other ways.

Other side-effects

Debugging levels 5 and 10 maintain global lists of blocks; as such, locks must be placed around access to those lists to keep MALLOC_DEBUG thread-safe. This will tend to serialize access to malloc/realloc/free and may change the timing of your program in subtle ways. This change may be a good thing, though, as it may uncover timing problems and race conditions.

If an error is detected in a block, this lock will be held while the error message is printed and the offending thread is dropped into the debugger, so other threads will eventually block on calls to malloc/realloc/free. You may actually find this a desirable feature, as it will halt the other threads (hopefully) close to the source of the error, and may help you debug problems which involve adverse memory interactions between two or more threads. If for some reason this behavior causes problems in an otherwise correct program (please check carefully before you determine that this is the case), then you may need to turn the debugging level back to 1, which does not use this locking.

More about block trashing

At all levels of debugging, MALLOC_DEBUG fills allocated blocks with garbage at appropriate times to catch errors where your application reads data from an uninitialized or freed block. In C++ programs, this will manifest itself as a member function call for a class where the crash occurs the first time the function tries to access one of the class members.

This is a typical crash log, taken from an x86 build of NetPositive, with the interesting parts of the dump displayed in red text:

segment violation occurred
void Image::Reference():
+0005  8002f142:   *        2840ff    inc    dword eax+0x28
/cgi-bin/nph-count:sc
   frame         retaddr
fc452d48   8002dbb6  long ImageConsumer::Write(unsigned char *, long) + 00000086
fc452d58   8004fdc8  void Consumer::GotData() + 000000fd
fc452e24   8004fc9e  void Consumer::MessageReceived(class BMessage *) + 0000003c
fc452e30   ec0934c6  void BLooper::DispatchMessage(class BMessage *, class BHandler *) + 00000058
fc452e44   ec093279  void BLooper::task_looper() + 00000143
fc452e80   ec092f59  long BLooper::_task0_(void *) + 0000001a
fc452e90   ec0427af  thread_start + 00000065
/cgi-bin/nph-count:regs
 eax 55555555   ebp fc452d48   cs 001b
 edx 801a07b0   esi 80172030   ss 0023
 ecx 55555555   edi 00000000   ds 0023
 ebx 80060ca4   esp fc452d48   es 0023
                               fs 0000
 eflags 00010206  eip 8002f142
 trap_no 0000000e  error_code 00000006
The Image::Reference() function loads the this pointer into register eax and is trying to access some data at offset 0x28, but is crashing because eax has turned into mush. Why? Because the caller, ImageConsumer::Write(), is trying to call Reference() on a bad Image pointer. It got that pointer from some memory in a block that has been freed and is sitting in the free list. Since the block has been overwritten with carefully selected garbage, the moment we try to dereference the now invalid memory, the error is caught right away.

So what is the garbage that gets written into a block? It is different depending exactly on the circumstances; knowing what kind of garbage gets written when will be a clue if you see this happening in your program. Here is the list:

0xD7
Written into a freshly malloc'ed block.
0x3F
For blocks that are realloc'ed to a larger size, this is written into the new, unused space.
0x55
Written into a block that has been free'd and is sitting in the free list before being free'd for good (when it will be re-trashed with 0x95, below). This only happens at debugging level 5 and higher.
0x95
Written into a block before it is free'd for good.

So, referring back to the dump above, the significance of eax being 0x55555555 is that we have tried to take a pointer from a member of a class which has been free'd by our application but is waiting in the free list for final disposal.

More about the per-block debugging information

Here are the details about the extra information that MALLOC_DEBUG stores for each block. We are giving you this information in case you need to examine memory directly and would like to understand some of the information you find in the block headers; however, please, please note that this is data is for INFORMATIONAL PURPOSES ONLY and is subject to change.

With that stern warning out of the way, why would you want to know the details of the MALLOC_DEBUG block headers? Let's say you're debugging a problem where a block gets overwritten after it is freed. Specifically, four bytes at a certain offset within the block are getting overwritten after it is freed. By looking at the diagnostic information that is displayed when the program breaks into the debugger, you can unwind the call stack and determine the identity of the block that is getting overwritten, but when you look at that offset within the block, all you can see is a pointer to an unidentifiable bit of memory. You'd rather not go through header files and count bytes to see which data member it is; isn't there a better way?

Yes, there is. Set a breakpoint or place a debugger call in your code to drop the program into the debugger before the offending object is destroyed, while its data is still valid. Look up the value of the pointer that is getting overwritten. Display memory starting at 48 bytes before the beginning of the block being pointed to, which will give you the MALLOC_DEBUG information for that block. From this information, you can find the block size and the call stack of the code that allocated the block, which should very quickly let you find out what kind of block it is.

Use the following information to decode the block header:

Offset
Size
Description
0 4 Pre-header padding (0x3975237F for allocated blocks, 0xC3E5B8F9 for freed blocks)
4 4 Block size (not including header)
8 4 Pointer to the next block header
12 4 Pointer to the previous block header
16 28 Seven levels of call stack, most recent first, four bytes per call
44 4 Post-header padding (0x792353F7)
48 ... Block data
... 4 Tail padding (0x92753777)

Common problems with NuMALLOC_DEBUG

Since the version of MALLOC_DEBUG that we used to test R3.1 was not as strict as this version, the new MALLOC_DEBUG may reveal bugs in the kits and operating system (particularly if you run MALLOC_DEBUG system-wide via the UserSetupEnvironment file, which is not recommended with the new MALLOC_DEBUG under R3.1). Here are known kit bugs that you may find:

Problem
You will get a "Block written to after being freed" exception on a BView that has BScrollBars targeting it when you delete the window that contains the view. If you examine the BView that has been erroneously written to, you will see zeros written into the four bytes starting at offset 0x30 into the block.
Workaround
Remove the scroll bars from the window and delete them before the view is deleted. A good place, if this is your subclass of BView or a subclass thereof, is to do this in the DetachedFromView() function. This bug will be fixed in R4, so the workaround is necessary only until then.