Less is more

In association with heise online

Therefore, it is crucial that the amount of input data to be tested in fuzzing - also called the "input space" - be kept to a minimum. The smaller the input space, the greater the probability that a program can be crashed in a reasonable time.

Efficient fuzzing thus requires a sound analysis of the data format or protocol used by the program. Such analyses often give testers an idea of where program flaws can be expected and which fields in the input data and protocols are interesting for the test. Such information can be gathered, for example, from documentation, the source code, disassembling, or logs of network traffic - which is why there are so many highly specialized fuzzing tools like CSS-Die for the style sheet support of browsers, AXman for the IE's ActiveX API, and IRCfuzz for chat programs (see table). The article "[ticker:uk_76512 Axt im Walde"] at heise Security deals with a select group of such tools and their use in practice.

Finding one of the few error conditions that exist in the input data space is unlikely, and therefore time-consuming, if you have no idea what you're looking for.

Such considerations directly lead to another extreme type of fuzzing: data mutation. Here, the basic idea is to change known, valid input data at randomly selected places. This process ensures that the mutated input data can provoke errors in most of the program's procedures. Generally, this method works without any intricate knowledge of the data format.

But if you can change enough valid input data, your success rate rises quickly. However, flaws caused by completely mutated data are hard to detect in this method.

Yet, preparations also help in data mutation. Completely random mutation of image data, for instance, will generally only cause pixel errors, which rarely cause the program to crash. In contrast, you need to analyze the data format to figure out how length specifications are corrected if your mutations change the length of data. However, this method is generally nearly blind to flaws only revealed by highly mutated data. Practical approaches therefore pursue a compromise between random fuzzing and pure data mutation.

Sliced or whole?

In a pioneering paper, security expert Dave Aitel describes a universal analysis method for network protocols: block-based protocol analysis [4]. Most common Web protocols are transmitted as a sequence of blocks consisting of link indications and data fields, even when they are embedded inside each other as CSS in HTML in HTTP. That's where this approach comes in.

The basic idea behind the block-based protocol analysis: even data structures embedded within each other can be depicted as a linear sequence of bytes.

You start by writing a program that will output packets containing random data each round in the protocol format. Function calls that output appropriate content - such as integers, strings, or constants - replace the protocol's individual data fields. Calls of block functions, for instance, structure the data functions, allowing length indicators within the blocks to be automatically adjusted by means of unique indicators.

There are basically two types of block-based protocol analysis. The more time-consuming, but also more successful method is the complete re-mapping of a well analyzed protocol by means of appropriate function calls. The input space can then be very carefully selected for fuzzing. The much faster alternative is to fish individual blocks and data fields that look promising out of a captured data packet and replace them with the function calls needed. The rest of the packet is then simply retained as a constant value. This approach basically constitutes data mutation in selected packet fields.

In practice, testers generally begin with the second method, which does not force them to deal with the protocol in detail beforehand. In this process, length indicators have proven to be easy to identify and especially productive; when used in fuzzing, it is relatively probable that they will lead to buffer overflows. Testers can then analyze the source code and binary program field by field in search of a complete functional representation of the protocol, which the first method would have produced. Usually, critical programming flaws pop up long before that.

In this manner, Aitel developed his Spike fuzzing framework written in C. It provides functions for various data types and block structures in addition to a number of network functions. To fuzz data fields, it uses randomly chosen values as well as certain strings and values that typically lead to crashes as a result of insufficient filtering or integer overflows. The framework also includes various reconstructions of protocols, such as MS-RPC, CIFS, IMAP and PPTP.

Block-based protocol analysis is not, however, limited to Spike. Indeed, it can also serve as the basis for fuzzing tools you develop yourself in other programming languages. Its principles can be applied in areas other than network protocols, such as when file formats are fuzzed; other fuzzing frameworks also utilize the method.

« previous 1 2 3 4 next »

Print Version | Permalink: http://h-online.com/-746195