A heap of risk

Buffer overflows on the heap and how they are exploited

On average, nearly half of all critical security leaks are due to heap overflows. Like its older brother, the stack-based buffer overflow, attackers can use this kind of error to inject and execute any code that they want. Apparently, even harmless image files can turn out to be dangerous Trojan horses.

Viewing an image file should be a harmless undertaking, and you would not expect to be able to catch a virus or Trojan in the process. After all, no program code is executed when the image viewer merely interprets the data in the image file to set the colours of a few pixels on your screen. Just about the worst thing you would imagine could happen is that a corrupted file would produce pixel salad on your monitor.

But if the viewer program was not properly written, inadmissible file structures in the image file can lead to a buffer overflow: data takes up more space than the programmer provided for, resulting in memory space being overwritten with data that does not belong there. Then, the least amount of damage you can expect is that the program consequently freezes and is terminated by the operating system. However, in a worst-case scenario, the image files have been cleverly manipulated in order to smuggle machine code into the system that the image viewer then executes. An example of this is allowed by an error in the Windows library gdiplus.dll, needed when JPG files are displayed. In a security context, taking advantage of such a programming error for malicious reasons is called an exploit.

Buffer overflows are basic security risks that not only occur with image files, but potentially whenever an application interprets data that may not be trustworthy -- and whenever the programmer failed to take something into consideration. Depending on where the overflow takes place, it is either called a stack or a heap overflow. Such errors used only to take place on the stack, so that the term buffer overflow is now generally used synonymously with stack overflow.

Unfortunately, heap overflows are becoming increasingly common, forcing us to be more careful with our terminology. If you want to be more precise, you should really speak of "buffer overflows on the heap" or "heap-based buffer overflows" because it is not the heap itself that overflows, but only a buffer that is stored there. In the following, we will nevertheless stick to the short, and commonly accepted, term "heap overflow."

To understand what effects a heap overflow has and, in particular, how it can be exploited, we first have to take a closer look at memory management. A program can store variables -- and hence also buffers -- in three different parts of memory: the data area, the stack, and the heap. The data block contains static variables. Whilst they can change when a program is running and therefore are theoretically vulnerable to attack, in practice there are hardly any attempts to cause buffer overflows here.

The compiler stores local variables in C functions, which programmers use as buffers for input data, on the stack. The stack also has another important function: here, the CPU saves the return address when a subroutine is called. If an attacker manages to overwrite this address with another value by means of a buffer overflow, the CPU will continue processing the program after the subroutine, but at the return address stored. This value would no longer be the original one, and would instead refer to the code that had been smuggled in. An exploit is born.

Stack layout<br />
mixed together with a program's usable data.
On the stack, management information is mixed together with a program's usable data.

The basic problem is that systems mix program variables and buffers together with management data. And so, if program data overflow, management information may be overwritten, and this could have an unexpected effect on how the program runs. It is therefore relatively easy to change the flow of a program by using a buffer overflow and allow an attacker's code to be executed. Programmers cannot change this general behaviour as they cannot control, for example, where the CPU stores a return address.

1 2 3 4 5 6 7 next »

Print Version | Permalink: http://h-online.com/-747161