There’s no particular reason to keep that many twice-daily snapshots – we do it because we can, and because the overhead remains acceptable. We’ve also duplicated all our machines so that all data is stored again off-site, and the snapshots are duplicated too.
Even with all these snapshots we can still run into trouble if someone creates a new file and loses it between snapshots, but this is true of any backup regime unless you keep a continuous backup. However, before getting into that, let’s talk about more conventional backup regimes.
The A to B of open-source backup
Over the years the two mainstays of open-source backup have been A, for Amanda, and B, for Bacula. We ran Amanda on all our systems for ten years before we replaced it with duplication and snapshots, and we considered running Bacula several times, but never quite made it.
Amanda is a server-side backup system designed to back up multiple computers to multiple storage devices. It offers various forms of encryption, and its great strength is the way in which it schedules backups among its collection of clients.
Most backup systems start off by taking a complete copy of the data to be backed up. This is known as a full, or level-zero, backup. When the next backup time arrives the system will back up only those resources that have been changed since the previous backup.
Such partial, or incremental, backups may be structured into various levels: if your first full backup, level zero, is taken on Monday, then on Tuesday you take a level one backup of all the files that have changed since Monday; on Wednesday, you have a choice of taking a level two backup – just those files that have changed since Tuesday – or another level one backup of all files that have changed since Monday, including those backed up on Tuesday.
On Thursday you could either take a level three backup of just those files changed since Wednesday, a level two backup of files changed since Tuesday or another level one backup.
You make these choices to control the cost of the backup in terms of time and disk space consumed. In principle, you want as many level zero backups as possible, since they have everything in them and therefore recovery will be easy, but such backups will be large.
If you take more incremental backups with higher levels they should be smaller and faster, but to restore data you may well have to search through multiple backups.
Once you’ve set up these different levels of backup, a problem arises over backing up a collection of machines automatically to a collection of stores of fixed size (which might have been tapes in the past, but are nowadays more likely to be something like a partition containing a tenth of a large hard disk).
Inventing an optimal schedule for a whole collection of machines in which these different levels are mixed up is very hard, and Amanda’s party piece was to take that responsibility away from the operator who sets up the backup software.
Instead, the Amanda scheduler asks each machine how big its backup would be for a full level zero backup, and for each level of possible incremental backup. It then arbitrates between all the machines and the available backup store sizes, schedules the backups and stores them.
All of this works automatically – the only subtlety being some adjustment to the initial setup if all the full, level zero backups won’t fit on the chosen backup media (if you have this problem, don’t try to back up everything on the first day; back up some machines and then add more on subsequent days).