A Heap of Risk - The H Security: News and Features

In association with heise online

The heap

Unlike other data areas, the buffers on the heap are dynamic. When a program is written, the developer or the compiler does not know which data will be saved there; the program is therefore designed to adjust the space needed whilst the program is running. Depending on the language, platform and purpose of the software, there are a number of functions that are used to allocate memory:

malloc(), calloc(), and realloc() are the most commonly used functions in ANSI-C code.
In C++, the operator new reserves memory on the heap both for arrays and instances of objects.
Delphi programmers use GetMem() and New().
A number of platforms offer non-portable functions, such as LocalAlloc() and HeapAlloc() on Windows.

A call to these functions reserves memory in the virtual address space of the process and then returns an address to the process for that allocated memory. Programmers can use this memory any way they want, but they also have to release it again when the time comes.

Typically, the operating system gives every process a heap of a specific size, although this size can be changed while the program is running. Usually, a system library that is mapped into the address space of the process when it is loaded handles this management procedure and divides the heap up into smaller sectors that the programmer can handle more conveniently.

If the limits of the memory space provided by the kernel are exceeded, an access violation occurs, and this generally terminates the process. However, there are no such limits within this memory space, and neither the CPU nor the kernel can recognize whether an overflow has occurred within the heap.

The amount of space that the kernel requires for a heap depends on which program is being executed. A number of data formats for executable files contain information about how big the heap is expected to be. The PE and PE32 Windows formats for executable files contain this information in what is called the "optional header" -- which is anything but optional. Indeed, it contains information about how much memory is to be reserved and how much is needed immediately both for the stack and the heap. The ELF32 file format primarily used for Linux contains similar specifications.

Like the stack, the heap also contains management information along with usable data. As a result, buffer overflow attacks on the heap basically work the same as they do on the stack: attackers can overwrite specific management data to change the program flow in order to inject and execute their own code.

A simple heap

Every runtime environment has its own heap management adapted to the special features of the system. In principle, a program can however, also implement its own memory management.

In the following, we will use a simple kind of heap management that is probably not found in practice but does help to understand the basic techniques. This sample implementation, which we will call SimpleHeap, consists of a doubly linked list that contains reserved blocks of memory. Each memory block begins with a header containing management data: pointers to the next and the previous memory blocks, the size of reserved memory, and a marker showing whether the memory block is in use. Then comes the actual memory area. In C, all of this looks as follows:

typedef struct __HeapHdr__ {
<ul class="see-also">
<li>truct __HeapHdr__ *next;</li>
<li>truct __HeapHdr__ *prev;</li>
</ul>
   unsigned int  size;
   unsigned int  used;
   // Usable data area starts here
} HeapHdr_t;

When the program is launched, SimpleHeap gets sufficient memory from the operating system, which it then manages on its own. Initialization creates a root element (or, root node), which contains all available memory:

root = (HeapHdr_t *) memory;
root->next = NULL;
root->prev = NULL;
root->size = initial_size - sizeof(HeapHdr_t);
root->used = 0;

All that SimpleHeap needs now are functions to allocate and release memory. These can be implemented without any refinements and thereby keep the functions simple.

The allocation function SimpleHeap_alloc() searches for an available block of sufficient size and uses however much of it is needed. The function SimpleHeap_free() is also very straightforward -- it frees memory by taking the pointer to the data block and marking it in the appropriate header as free by setting the used field to 0:

<tt>SimpleHeap_alloc()</tt> allocates a block of memory.

Such a simple implementation would, however, have a severe drawback: the heap would become increasingly fragmented the more it was used. Whenever a section is allocated, a new block is cordoned off from the overall heap. While SimpleHeap_free() does release it again, it does not merge it with the original free block. If a new, larger block is to be allocated after this release, SimpleHeap will not be able to use the released block even if it would be big enough when merged with some other adjacent free space.

SimpleHeap_free() without the concatenation of freed blocks.

For this reason, the function SimpleHeap_free () in the listing merges consecutive free blocks and detaches the last one from the doubly linked list. To do this it has to manipulate two pointers: the next pointer in the first free block and the prev pointer in the block after the next:

hdr->next=hdr->next->next; 
hdr->next->next->prev=hdr->next->prev;

Most heap implementations work along similar lines. In some cases, however, free memory blocks are released and merged separately so that merges occur after a slight delay.

SimpleHeap_free with merge
When releasing memory SimpleHeap merges consecutive freed blocks. The pointers indicated by dotted lines are deleted.