According to the specification, the first two bytes of the data block represent the
RECORDHEADER. Reading the definition of this data type makes my hair stand on end. Why these cheapskates want to save a couple of admin bits for chunks of data that have long been measured in megabytes is a mystery to me. The upper 10 bits contain the tag type – here it should be 0x56 – and the remaining 6 bits its length.
Because additionally you have to consider the little endian byte order of Intel's architecture, which means that high value bytes are at the back, I have to write it down.
According to this, the
RECORDHEADERs yield the data blocks
A8 15 and
8C 15; in each case the
0x56 tag type given by
swfdump and a length of 40 and 12 respectively. So far, so good. This is followed by the number of scenes as an
EncodedU32 data type
What on earth were they thinking of? The specification reads like something from the last millennium.
EncodedU32 contains an unsigned 32 bit integer which, depending on its size, is encoded using a variable number of between one to five bytes – "to save space", as Adobe's spec kindly explains.
No wonder Flash security vulnerabilities seem to be never-ending. Instead of working with normal data types such as
unsigned int, they have to try to save a couple of bytes. But complexity is famously the enemy of security – and this just screams 'trouble'. Just imagine if the designer of formats for .exe files or CPU commands had used this kind of data type. We'd be drowning in security vulnerabilities.
It's also slow, as each time an
EncodedU32 value is accessed, instead of a simple read operation, the decoder routine has to be run. This skimping on bits becomes even more incomprehensible when you think that SWF files can be compressed – a much more efficient way of saving space.
But I'm starting to rant – back to the supposed iPhone video's
SceneCount. If the highest bit of a byte is set, the next byte has to be added in. A glance at the hex dump tells me that the first four data blocks after the RECORDHEADER all have the highest byte set. So I have to correctly add together the maximum five bytes. Adobe helpfully provides a reference implementation for unpacking EncodedU32 value in C. Rather than shovelling bits about by hand, I quickly run it through the compiler.
Checking out the second, shorter data block with GetEncodedU32.exe yields a perfectly plausible
SceneCount of 1. But for the first data block, it coughs up a value of
\xa6\xe1\x8a\xa0\x08. More than 2 billion scenes? That's impossible. My hunch was right. Someone's pulling a fast one.