The dos and don’ts of VM snapshots
Just as there are programs that stop you trying to virtualise them, when arguably they shouldn’t, so too are there things you can do once you’re virtualised that have unexpected and painful results; where the basic problem isn’t a shortage of hardware power but, if anything, an excess. In this case, it’s all about snapshots.
This is a word that crops up in several contexts in computing, but in this case I’m referring to that feature, common to pretty much everyone’s hypervisor, which lets you freeze the state of a virtual machine at the click of a button.
What’s achievable on a virtual disk depends on another, very much non-virtual, physical drive
The idea is that, having taken such a snapshot, you can roll back to it at any time, no matter what evil has befallen your virtual machine (VM) in the meantime. This may sound rather exotic, but the earliest adopters of virtualisation were all developers for whom it was pretty much essential to be able to roll back the clock after their code had run hog-wild and trashed the operating system. Even now, on the largest and most enterprise-focused hypervisors, you’ll find a “snapshot” button.
Snapshots can be of either the disk state and the memory state of the machine, or both, and are required to perform the virtualisation party trick of moving a running guest machine from one host to another (but let’s not get carried away with that here). Let’s think for a minute about disk snapshots. I know what the refusenik types are going to say: how can it be an actual “snapshot” when any fool knows that even the world’s fastest backup utility takes an hour at the very least to read the 30 to 60GB that comprises a typical server’s boot partition?
Good question, is my response, and the answer is that the snapshot is actually more of a redirect process. When you hit that snapshot button (and for reasons shortly to be made clear, please don’t click the things in your hypervisor as I describe them), what actually happens in Hyper-V and others is that the hypervisor will stop writing to its original disk image file.
Reads still carry on going back to it, but all write activity goes to a new file containing the block your activity changed, plus a little marker on the original that says “actually my updated copy is over here”. Eventually you’ll have quite a large “new activity” file (or “differencing disk” as Microsoft calls it) plus an unmolested original virtual hard disk file. If you press “snapshot” more than once you’ll get several differencing disks, each of which is in a relationship with all its ancestors.
For a developer this is great stuff: unlike most server operating shops, a developer’s work cycle consists of constructing sacrificial VMs that their buggy code can blow to pieces, so each snapshot tends to be speculative and the machines in question have discardable disk contents. If you can see this one coming, award yourself a virtualisation gold star. What happens in a typical commercial server deployment is completely different: almost all the VM deployments I’ve been involved with have been bumping up against the poorly defined limits of what’s achievable on a virtual disk that sits on top of another, very much non-virtual, physical drive that has itself been formatted by a hypervisor and/or host OS.
Many people think there should be no overhead involved in this type of piggy-backing, so that if you create a 300GB data volume as a VM disk inside a 500GB hypervisor volume, then 100% performance is 100%. My experience says otherwise, and so does the rest of the marketplace, where vast amounts of money are made from separating boot partitions (which live on VMs) from data drives (which live on Storage Area Networks) so that everything runs as smoothly as it can.