Choosing a file system
An article about file systems may not strike you immediately as the most productive way to spend the next ten minutes of your life – for most of us, which file system we use doesn’t matter much. On a desktop PC, we typically use its internal disk purely as somewhere to put files, and questions of performance and resilience don’t arise. This may even be true of some servers, but sometimes the choice of file system, how it’s organised and managed can become of vital importance. Choosing the wrong method and having to change can result in downtime and an awful mess. In this column, we’ll look at how Unix systems organise disk space, concentrating on open-source developments (and, where these products aren’t available for Windows, they have commercial variants that are).
Whether your machine runs Unix or Windows, the job of its file system is the same – our data is put into files, which are put into directories, and those directories are typically organised as a tree (although some directories may be shared between different parts of the tree). The structure of these directory trees, who owns the files in them and how they’re allowed to access them is what we call metadata (literally data about data). The operating system maps all this data and metadata onto the actual hard disk hardware present. Most disks are organised into blocks of fixed size, arranged in concentric circles or tracks on the revolving platter, but some employ variable-sized blocks and most have multiple platters and read/write heads. The operating system must hide all these specifics of disk operation from the user so efficiently that it’s completely transparent, and how this is achieved has altered radically over the years. Disks now contain far more data than they used to, and they also work differently.
Some operating systems used to take account of precise drive construction details to ensure maximum performance – for example, by allocating blocks in a way that takes account of platter rotation speed – but such optimisation techniques no longer work when a drive may declare itself to be organised one way but actually be organised internally in another way. For example, a bunch of drives organised as a RAID array will hide from the operating system many details of their actual internal organisation.
There’s a great deal of file system folklore among a certain generation of systems managers, and inappropriate optimisation techniques inherited from the old days are sometimes still mechanically applied by these people. We were recently looking at a client’s system, where disk drive contents had been organised in such a way that the system manager believed his most recently used files were stored in the middle of the disk to ensure the read heads were in a position that was most optimal. This might have been a reasonable optimisation for a particular generation of early disk drives, but multiple heads make it unlikely to be optimal nowadays. There are also stories (almost certainly apocryphal) of system managers who organise disks with certain files on the outside of the platter because they believe it’s moving faster there and so data will be fetched more quickly.
From some standpoints, the job of the operating system has got easier, since it’s less exposed to the vagaries of individual drive mechanisms, but it also has far more resources to organise, such as the caching of disk contents in memory. Under both Unix and Windows, the amount of metadata cached can have a marked effect on performance; in fact, most Unix systems require such caching to get anything like reasonable performance.