This level of debugging adds padding to the beginning and end of each block of memory to check, fills
the block with garbage when initialized and after it is freed, and checks for the
following errors, all of which are detected when (or if) the block is freed:
| |
This list is fixed by number of blocks, and is not segregated by block size. You can adjust the size of the list through the environment variable MALLOC_DEBUG_FREE_LIST_SIZE; it is set to 1000 blocks by default. The runtime speed penalty for this debugging option is minimal, as the allocator must perform minimal extra processing to maintain the list of freed blocks. The memory penalty will depend on the size of blocks that are present in the list and the amount of turnover in block allocation, but can be tuned by adjusting MALLOC_DEBUG_FREE_LIST_SIZE. It is recommended that this environment variable be kept to a reasonably high value so as to give your application as long a time as possible to write to the block before it will no longer be detected (or, worse, the application will write into memory now used by a different, freshly-allocated block). There is something to be aware of when you are using this option: If your program that gets an error for a block that has been written to after being freed, then the call stack that leads to the memory call that detected the error isn't meaningful, since it is dealing with a different block than the one with the error. You need to look at the offending block itself and the call stack that led to its allocation; that information is present in the error message the debugger displays when invoked. | |
The runtime speed penalty for this option can be considerable, especially if your program allocates a large number of blocks or if your MALLOC_DEBUG_FREE_LIST_SIZE environment variable is set to a large value. For this reason, the MALLOC_DEBUG_CHECK_FREQUENCY environment variable is used to control the frequency of these consistency checks. It is defined by the number of malloc/realloc/free calls that are made in between consistency checks, and defaults to 1000. If the value of this environment variable is 1, MALLOC_DEBUG will check with every call to malloc/realloc/free; if the value is 2, it will check every other call; 1000, it will check every thousandth call, and so on. It is recommended that for normal debugging, if you use level 10 debugging or greater, that you keep this setting to a relatively high value so as not to impact your runtime performance greatly, but to still give you the advantage of checking blocks that are infrequently (or not at all) freed. If you run into memory problems and are having difficulty pinpointing when and where they occur, you can set this value lower to provide a stricter environment. |
The call stack is not guaranteed to be accurate; there may be some bogus return addresses in the listing. However, there should be enough information in there for you to determine where the offending block was allocated, and hopefully, what it is and what to do about it.
This call stack feature is always available when MALLOC_DEBUG is turned off; you cannot disable it to free up the memory (28 bytes per block) that the call stack consumes.
If an error is detected in a block, this lock will be held while the error message is printed and the offending thread is dropped into the debugger, so other threads will eventually block on calls to malloc/realloc/free. You may actually find this a desirable feature, as it will halt the other threads (hopefully) close to the source of the error, and may help you debug problems which involve adverse memory interactions between two or more threads. If for some reason this behavior causes problems in an otherwise correct program (please check carefully before you determine that this is the case), then you may need to turn the debugging level back to 1, which does not use this locking.
This is a typical crash log, taken from an x86 build of NetPositive, with the interesting parts of the dump displayed in red text:
segment violation occurred void Image::Reference(): +0005 8002f142: * 2840ff inc dword eax+0x28 /cgi-bin/nph-count:sc frame retaddr fc452d48 8002dbb6 long ImageConsumer::Write(unsigned char *, long) + 00000086 fc452d58 8004fdc8 void Consumer::GotData() + 000000fd fc452e24 8004fc9e void Consumer::MessageReceived(class BMessage *) + 0000003c fc452e30 ec0934c6 void BLooper::DispatchMessage(class BMessage *, class BHandler *) + 00000058 fc452e44 ec093279 void BLooper::task_looper() + 00000143 fc452e80 ec092f59 long BLooper::_task0_(void *) + 0000001a fc452e90 ec0427af thread_start + 00000065 /cgi-bin/nph-count:regs eax 55555555 ebp fc452d48 cs 001b edx 801a07b0 esi 80172030 ss 0023 ecx 55555555 edi 00000000 ds 0023 ebx 80060ca4 esp fc452d48 es 0023 fs 0000 eflags 00010206 eip 8002f142 trap_no 0000000e error_code 00000006The Image::Reference() function loads the this pointer into register eax and is trying to access some data at offset 0x28, but is crashing because eax has turned into mush. Why? Because the caller, ImageConsumer::Write(), is trying to call Reference() on a bad Image pointer. It got that pointer from some memory in a block that has been freed and is sitting in the free list. Since the block has been overwritten with carefully selected garbage, the moment we try to dereference the now invalid memory, the error is caught right away.
So what is the garbage that gets written into a block? It is different depending exactly on the circumstances; knowing what kind of garbage gets written when will be a clue if you see this happening in your program. Here is the list:
Written into a freshly malloc'ed block. | |
For blocks that are realloc'ed to a larger size, this is written into the new, unused space. | |
Written into a block that has been free'd and is sitting in the free list before being free'd for good (when it will be re-trashed with 0x95, below). This only happens at debugging level 5 and higher. | |
Written into a block before it is free'd for good. |
So, referring back to the dump above, the significance of eax being 0x55555555 is that we have tried to take a pointer from a member of a class which has been free'd by our application but is waiting in the free list for final disposal.
With that stern warning out of the way, why would you want to know the details of the MALLOC_DEBUG block headers? Let's say you're debugging a problem where a block gets overwritten after it is freed. Specifically, four bytes at a certain offset within the block are getting overwritten after it is freed. By looking at the diagnostic information that is displayed when the program breaks into the debugger, you can unwind the call stack and determine the identity of the block that is getting overwritten, but when you look at that offset within the block, all you can see is a pointer to an unidentifiable bit of memory. You'd rather not go through header files and count bytes to see which data member it is; isn't there a better way?
Yes, there is. Set a breakpoint or place a debugger call in your code to drop the program into the debugger before the offending object is destroyed, while its data is still valid. Look up the value of the pointer that is getting overwritten. Display memory starting at 48 bytes before the beginning of the block being pointed to, which will give you the MALLOC_DEBUG information for that block. From this information, you can find the block size and the call stack of the code that allocated the block, which should very quickly let you find out what kind of block it is.
Use the following information to decode the block header:
0 | 4 | Pre-header padding (0x3975237F for allocated blocks, 0xC3E5B8F9 for freed blocks) |
4 | 4 | Block size (not including header) |
8 | 4 | Pointer to the next block header |
12 | 4 | Pointer to the previous block header |
16 | 28 | Seven levels of call stack, most recent first, four bytes per call |
44 | 4 | Post-header padding (0x792353F7) |
48 | ... | Block data |
... | 4 | Tail padding (0x92753777) |
You will get a "Block written to after being freed" exception on a BView that has BScrollBars targeting it when you delete the window that contains the view. If you examine the BView that has been erroneously written to, you will see zeros written into the four bytes starting at offset 0x30 into the block. | |
Remove the scroll bars from the window and delete them before the view is deleted. A good place, if this is your subclass of BView or a subclass thereof, is to do this in the DetachedFromView() function. This bug will be fixed in R4, so the workaround is necessary only until then. |