My instincts tell me that there's more going on here. I remember that shellcode is quite often encoded, first to hide suspicious code, and second for practical reasons. For instance, certain characters cannot be used in many cases, such as ASCII 0, which signals the end of a string and therefore cuts off the rest of the code. On the other hand, a number of command sequences and memory addresses contain the value 0. As a workaround, you can apply XOR with a fixed value to your code and prepend it with an appropriate decoding routine that undoes everything. But you have to remember to use a character that does not occur in the code because you otherwise end up with a zero again.
Of course I could start looking for these decoding loops. But this time, I choose a brute-force mace over the heuristic rapier; I load the entire document into memory, and systematically try out all of the byte values. After each scan, I once again check whether a PE or OLE signature has been found. Even with this file of just under one megabyte, the process does not take more than a minute.
Bingo! With scan brute, my scanner suddenly spits out four files: an embedded OLE file, and three PE files. All four of them had been encoded via XOR with
0x85. The OLE file is Celebrities Without Makeup, which Procmon had already shown me. And when I upload the PE files via Virustotal, a number of Trojans and downloaders are immediately detected.
Now, I am almost finished with my analysis. But I would still like to know a bit more about what the shellcode really does. Normally, the FS:30 pops up toward the beginning. That would be at 0x506e and shortly before that, at 0x5004, DisView finds a typical function prologue:
sub esp, 00000120h
mov edi, esp
I now simply assume that this is where the code starts. But a statistical analysis of the assembler code would be too much trouble. Even with a really good disassembler like IDA Pro, the dynamically loaded import names and self-modifying code parts would make the process complicated. It would be better if I could watch the code live in the debugger. But wait a minute – didn't I write a tool that adds a working EXE shell to shellcode? I should only have to...
A bit later, MalHost-Setup creates evil.exe out of the shellcode starting at 0x5004, and I can now directly launch the EXE file. I don't want the file to immediately take off uncontrollably, so I have MalHost-Setup use the parameter wait to overwrite the first two bytes with
0xEB 0xFE. In Intel's machine language, that stands for
which is a jump to the eip instruction pointer's content, creating a compact infinite loop. That gave me enough time to get a coffee after I got it running. Might as well let it take a few spins.
Then, it was OllyDbg's turn. Using File/Attach, I latch on to the evil.exe process with the debugger, and after Run/Pause (F9/F12) I promptly reach my infinite loop. To study the code, I have to patch it back into its original state. That's easy to do via the right mouse button and Follow in Dump/Selection. I hit CTRL-E for the exit mode, and 0xEBFE turns into the original bytes
0x81EC; the function prologue I had before now reappears in the debugger's code window.
Now, I can get to work. In live debugging, the code quickly reveals its secrets. For instance, behind
call [ebp+4] for
0x50ae lies the dynamic import of
GetFileSize. Here, the shellcode goes through a sort of self-discovery process. In the loop, it tries out all of the file handles until it finds one linked to a file of 968,192 bytes – exactly the size of the PowerPoint file. It then jumps to certain points in the file, unpacks the programs embedded at those spots, writes them onto the hard disk, and launches them.
Mission accomplished! This time, I'm satisfied with what I found. Of course, I had to write a complete scan suite to get all of this PowerPoint file's secrets, but no matter – I am certain that the suite will continue to serve me well. After all, vulnerabilities in the Office format are a dime a dozen.
We are aware that some anti-virus programmes raise an alert on the OfficeMalScanner-Suite. This is a false positive and the vendors have been notified.
Microsoft has also realized that we need better tools to analyse Office files, which is why it recently released its Office Visualization Tool (OffVis) to the public. It analyses the file structure and provides feedback about potential exploits. In my PowerPoint file, it recognises an exploit for a PowerPoint vulnerability that Microsoft closed a year ago along with 13 others (CVE-2009-0556 in MS09-17). But because it does not offer a way of identifying shellcode or detecting embedded executables, it cannot replace the OfficeMalScanner suite in my toolbox.
In our "Crime Scene Internet" series, experts examine suspicious files using every trick in the book. Look over their shoulder as they track down real malware – because the whole story could have happened exactly like this.
The expert in this episode, Frank Boldewin, is an IT security architect at GAD eG in Münster, Germany. In his scarce free time, he deals with analyses of new root kit and trojan technologies in addition to publishing tools and white papers on these topics at his website reconstructer.org, where the suite he used to investigate Office files in this article can also be downloaded. And by the way, it really was written roughly as described here. The next episode of CSI:Internet will deal with a time bomb in a PDF file. The first episode was Alarm at the pizza service.