Office goes XML
Microsoft has made a major announcement over the future of its Office file formats, and to understand the profound nature of this change a little history is in order. For many years, Microsoft has been using private file formats for storing Office data files: for example, in the case of Excel, this was called BIFF (for Binary Interchange File Format, if memory serves). In the mid-1990s, Microsoft changed the Office formats to accommodate Structured Storage, which was an object-oriented storage technology from the OLE (object linking and embedding) team.
In a nutshell, a structured storage file was a complete filing system contained within a single file. The internal data structure was identical – just as an NTFS file system has the same basic form on all NTFS formatted disks – but the data that you poured into each ‘file’, or ‘stream’ as it was called, was unique to each application. This meant that all Office files, whether they were XLS, DOC or PPT, were structured the same internally, but the nature of the data in the various streams was different, depending on the application. In essence, to take the example of Excel, Microsoft just poured Excel BIFF format data into the primary stream of the structured storage file.
Structured Storage was pretty well documented by Microsoft, but it did require you to use the OLE libraries to access the data, which was a significant problem for vendors on other platforms like Linux and Apple, because they had to do the hard work of replicating the OLE libraries for structured storage access. Once you would got into the primary stream, you had access to the native data for that application, but this wasn’t documented unless you signed up to significant NDAs (non-disclosure agreements) with Microsoft. This added yet another layer of complexity to the whole reverse-engineering problem for vendors who wanted to read and write Office files. There is no question that Microsoft used this technology lock-in to keep customers on the Windows platform and using real Office as opposed to its rivals. Moving to a competitive product was too horrible to contemplate, because you would have to take care of any of the numerous glitches that could occur when using a third-party application to read an Office file.
With Office 2003, however, Microsoft turned the tables on us yet again. It published an XML schema for the office file formats of Excel and Word. These XML file formats were very complete, lacking only the most esoteric of functions (arbitrary text angle rotation in a cell in Excel being one that I remember). These new file formats were published on the Microsoft website and were available for anyone to use, even purveyors of a competitive products, but you did have to acknowledge that Microsoft owned the intellectual rights to the file formats. At that time, there were howls from Microsoft’s competitors that it was doing this merely as a short-term measure and that there was no way Microsoft would follow up on its new openness. They noted that XML wasn’t the default file format, and that remained the traditional binary format.
Well, with this new announcement about XML support in the forthcoming Office 12 release, Microsoft has in fact delivered the next stage, and it is most significant, not merely because Microsoft has extended the scope of XML file formats to include PowerPoint, but because it is going to make XML the standard, default format for all Office components. In other words, the old OLE structured storage binary format is now considered legacy technology. The new files will be called DOCX, XLSX and PPTX to differentiate them from the binary versions.