The "CSI:Internet" series was originally published in c't magazine commencing in issue 13/2010. For links to other articles in this series please refer to out CSI:Internet HQ page. The code fragments in this article may cause your anti-virus scanner to issue an alert – this is a false alarm.
Episode 3: PDF time bomb
by Thorsten Holz
Tom sends me something on 'NTFS internals' – technical details of the Windows file system implementation. How did he know that this had been sitting on my to-do list for ages? Had I mentioned it at lunch? Curious, I open the attached PDF.
Before I even reach the end of the first paragraph, Adobe Reader closes of its own accord. In a bit of a doze, I click on the attachment for a second time and an action replay unfolds – the text appears in Adobe Reader and then the window vanishes all by itself just a few seconds later.
This is more than a little strange – suddenly wide awake, I take a closer look at the e-mail. What's with the formal "Regards, T Gibbs" at the end? Tom always signs off with "Cheers, Tom". A look at the full header tells me that this clearly hasn't been sent from one of our internal systems:
Received from 18.104.22.168
it's been sent from somewhere in Asia. Things are becoming clearer. The sender is faked and the PDF file is probably an attempt to infect my computer. But has it been successful?
I could of course just restore the system image I created yesterday. That would take 30 minutes at most and I could then get on with doing the travel expenses that accounts has been nagging me about since last week. OK, that's it, I'm going to analyse the PDF file.
Inside a PDF file
First off, I need to refresh my knowledge of the PDF format. I vaguely remember that a PDF file is composed of various objects in a tree structure. Each object describes one aspect of the document – page content, for instance, is stored in one object and font type and size in another.
The plan is to analyse each object in the suspect file to get an understanding of what's going on. Before opening the file in a text editor, I pull down O’Reilly's "PDF Hacks" by Sid Stuarts from the bookshelf.
WordPad shows the characteristic "%PDF-1.4" string at the start, so it's definitely a PDF file. The basic PDF structure is simple and easily recognised. Individual objects within the file have the structure:
$nr $version obj
endobj are fixed separators between objects. They are numbered sequentially using
$nr. The version number is usually 0, as a document usually contains only one version of an object.
Object structure depends on the object type. Object parameters are typically given within a 'dictionary', designated by
>>. There is usually a table of contents at the start with
/Type /Catalog – as there is in my 'NTFS internals' file:
1 0 obj
/Outlines 3 0 R
/Pages 4 0 R
This essentially consists of references in the form
$i $j R. Object three claims to be a kind of overview of content, with the actual pages following as
/Pages. Towards the end, a few document properties, such as the read direction – L2R (left to right) – are specified. So far, so dull. I'm beginning to wish I'd done my travel expenses.
Object two includes more admin, which confirms my suspicion:
/Creator (Scribus 22.214.171.124)
/Producer (Scribus PDF Library 126.96.36.199)
Much as I appreciate open source, I'm sceptical about the likelihood of someone using DTP program Scribus to layout their knowledge of NTFS before converting it to a PDF. I scroll down further and it finally starts to get interesting.