Still, it was pretty unnerving that I had no idea what had caused the problem, or whether it might recur. It brought back uncomfortable memories of a case that a guy I used to work with once told me about. A large multinational customer of his ran an online bookmaking service and, shortly before the World Cup, was blackmailed with the threat that his servers would be taken out right in the middle of his peak period. The attackers followed up their threat with a demonstration of their capabilities by gumming up the server for several hours using a distributed denial-of-service attack (DDoS).
Analysis showed that the attack had been carried out using 'SYN flooding'. To establish a TCP connection to a server, a client starts out by sending a SYN packet. The server registers the connection request, sets up the required data structures in the form of a transmission control block (TCB) and sends out a SYN-ACK as a response. A genuine client then responds with a confirmatory ACK and, following this three-way handshake, the connection is established.
In the case of a SYN flood, the client sends several SYN requests with fake sender addresses to the server in rapid succession. The SYN-ACK packets disappear into the ether – but then it's not as if the client had any intention of responding anyway. The server under attack, which doesn't know this, has to keep the TCBs open for some time. The result is that its TCB buffer is filled with futile data, with no space left for legitimate requests – access to the server is blocked.
Of course you can reduce the TCB retention time or enlarge the TCB buffer, but even then you don't stand a cat in hell's chance against thousands of bots with all guns blazing. The one positive to come out of the story was that the bookie perceived the threat as a slight on his honour and was prepared to toss a whole heap of money around to get it dealt with – in short, my colleague was given carte blanche to take care of it.
To keep the server up and running even during high load periods, he installed a SYN proxy. This is an upstream server which does nothing but respond to SYN requests, only passing requests in which the client has demonstrated a genuine interest – by returning an ACK packet – to the server.
But even the SYN proxy's capacity to store data in TCBs isn't inexhaustible. To get around this problem, it uses a neat trick, first proposed by Dan J. Bernstein. This technique goes by the name of SYN Cookies. It generates an MD5 hash from the connection related data – IP address and port – and a secret value. It uses this cookie as the initial sequence number for a SYN-ACK packet. This number would normally be randomly generated to prevent session hijacking, but the MD5 hash is as hard a nut to crack as a random number, so does the job just as well.
If the client responds, thus completing the three-way handshake, the response packet contains confirmation of the sequence number in the form of the ACK. The server generates a new MD5 hash using the ACK packet's secret, address and port and compares this value with the sequence number confirmed by the client. If they match, the server knows that the client must have the cookie and establishes the connection.
The advantage of this procedure is that the server doesn't have to store data for spurious SYN packets. All it needs is enough processing power to calculate the cookies in real time. With a little help from the ISP, the combination of SYN proxy and SYN Cookies meant that the server was indeed able to fend of a SYN flood attack with a bandwidth of up to 100 MB per second. And, as my colleague had hoped, rather than get into a long drawn-out trial of strength, the blackmailer decided to cut his losses and look for a less well-defended victim.
Unfortunately, that wasn't much use to me. We're not talking World of Warcraft here, just an itty-bitty RPG, and I was going to have to get by on what I had to hand. Well at least I had a sensor. When something like this had happened (though my past experience was more in the line of email and FTP) I had got into the habit of firing up the network monitor, so that, if the problem cropped up again, I could at least see what had been going on beforehand.
To this end, I had set up a dedicated server in the data centre as a recording system, which allowed me to look at the entirety of traffic upstream of the firewall. This was especially practical given that, at the time, I wasn't administering the firewall myself. The recording system enabled me to determine whether problems were arising as a result of errors falling within my remit or were down to the firewall rule set.
I fired up Wireshark, an open source network analytics tool, on the recording system and set up a simple capture filter on the forum server's IP address. I defined the recording process as multi-file recording and set up a ring buffer with 1,000 files, each 32 MB in size. This ensures that the hard drive never gets full, since the oldest files just get overwritten by newer files – though this does of course run the risk that important files will be overwritten if you're too slow to react. I started recording, logged out of the recording system, checked the monitoring system – all clear – and went to get myself a well-earned coffee.
I had barely set down my empty mug when the server status once more flipped from green to red. The server load was showing the same symptoms as before – everything was running as slow as dirt and I was seeing a whole heap of Apache processes.