The importance of load balancing

Back in August, there was a small epidemic of downtime across a broad swathe of the big names in public, free website services.

The importance of load balancing

For a while Google seemed to be down, likewise Facebook, and I expect there’ll be other cases to point to.

Everyone is surprised by the idea that such globe-spanning services could ever be taken out of action, and it’s assumed that there must be a crazy issue with rooms/buildings/streets full of servers. Even some seasoned network people shake their heads at the amount of data and the decisions that need to be made to recover from such an outage. I don’t.

Everyone is surprised by the idea that such globe-spanning services could ever be taken out of action

Instead, I think about the few visits and conversations I’ve had with people working at this kind of scale, and the critical role that’s played by some remarkably small, rare bits of kit.

Take a load off

I’m thinking about load balancers: devices that take traffic apparently aimed at one web address and parcel it out across an effective infinity of machines within a private network owned by the service provider. Websites make a lot of use of these devices, since apparently nobody has worked out how to hand off traffic from server to server – either that, or a division of labour between traffic manager and server-with-disks is seen as beneficial.

Sometimes it’s even worse than that. I mean that the problem isn’t only about how many lumps of brain power you can put in the block diagram of your architecture, but rather that the basic networking standards we all use every day don’t have the right attributes to help with the job.

Think of this comparison: most freight these days moves by road, and it’s taken for granted to be a good idea to arrange your products so that they fit into one box or a shipping container. Where the freight won’t fit into any of those containers, the only way to move it around is by breaking the rules – one of my client firms moves its large products to a nearby port by driving down a preconfigured section of roads that feature unboltable lampposts…

This is a very physical example of changing the rules to overcome a specific difficulty, and equivalent approaches are becoming necessary in truly large server deployments.

IP networking as we know it isn’t so brilliant when it comes to managing and directing the traffic patterns faced by Google or Facebook.

F5 Networks

I met with one of the vendors of the kit that does this work back in early 2013. Called F5 Networks, it makes relatively small boxes – four would fit in one network rack – that can have hundreds of Ethernet connections of all speeds and media types, as well as the more exotic, long-haul data-shipment standards.

Although they were a little coy about big-name users and exactly what they did with the kit, it was hard for them to deny that its biggest deployments seldom exceeded ten live boxes per customer.

This means that its smaller deployments may have only two or three boxes. Admittedly, this is quite some box we’re talking about – certainly something in quite a different league from the usual telecoms box that’s programmable via a serial cable and an antediluvian armoured laptop, in a tent up a mountain.

We’re talking here about months of thought and drawings and discussions before every configuration change, with huge amounts of instrumentation, log files and brain-bending interfaces between protocols. It’s a major piece of work just to get these things into action.

If you want to feel as thoroughly inadequate as I did, you can take a quick test run of the F5 virtual-machine version of the company’s load balancer software stack.

It’s usable, the company says, for only a few thousand machines. You can take a config you’ve put together and upload it to the heavy-metal physical version. This is just as well, since the gap in performance between what a VM can do and what the dedicated hardware can do is startling.

Running the numbers

Whenever you look at these huge public systems, I’d suggest that thinking about the huge count of servers you might end up being run on is largely irrelevant. The more interesting figure is how many load balancers and protocol converters you’ll pass through on the way.

A handy current benchmark was given inadvertently in a presentation by the IT director of NASCAR this year, at an event hosted by NaviSite. He had 2.5 million regular users, and while there are plenty of servers running in parallel within the NASCAR setup, there’s only the one firewall (admittedly, quite large, and equipped with more than the usual two Ethernet connections) sitting between them and the impatient, impassioned and impolite hordes of NASCAR fanboys.

As I’ve said here before, most of the networking kit out there hasn’t benefited from the hothouse development that’s made PCs and servers so incredibly efficient over the last half-decade or so.

Quite a lot of kit from ten and even 15 years ago is still shovelling packets, with fans staggering along at maximum, and power supplies humming painfully away.

Eventually, even for the very biggest operators, the downtime involved in a root-and-branch upgrade will become worth it as new kit emerges that uses modern ARM processors, smarter code and more efficient power management.

Disclaimer: Some pages on this site may include an affiliate link. This does not effect our editorial in any way.

Todays Highlights
How to See Google Search History
how to download photos from google photos