XML all round

Surprise, surprise – Microsoft has just announced that the next version of Office, currently called Office 12, will be changing the default file formats for Word, Excel and PowerPoint from their currently proprietary binary formats to a new standard called ‘Microsoft Office Open XML Formats’. These new formats are based on the XML formats currently used by Office 2003, but with the addition of ZIP compression; this makes the files smaller and enables them to include embedded objects such as images in the same file with the text and formatting. If all this sounds somewhat familiar, that may be because this is exactly how OpenOffice 2 stores its documents when it is released. Other Office applications such as Visio, Access, Project, OneNote, Publisher, FrontPage and Outlook may get new file formats at some point in the future, but nothing has been announced at this juncture.

XML all round

Microsoft boasts that these new file formats will bring users the benefits of smoother data interoperability, better security, improved error recovery and dramatically reduced file sizes – all without sacrificing backwards compatibility. Now, the only way that the company can keep these new files compatible with older versions of Office is if it releases converters. Older versions of Office can then open and save to the new formats, but it means users of previous versions of Office will have to download and install these new converters. So, for which versions of Office will Microsoft provide converters? Office 2003 will obviously be supported, and Office XP. Office 2000 is still used by a fair number of people, but what about Office 97 or even Office 95? Those versions are still used, but by fewer and fewer people. Will Microsoft bother to release file-format converters for software that will by then be nine to 11 years old? The current answer appears to be no. Microsoft has announced converters for only Office 2000 and later, so if you use Office 97 or 95, you will not be able to open documents saved in the new Office 12 default formats.

Apparently, the file extensions employed for the new Office 12 files will be constructed from the old binary format extension with the addition of a final ‘x’, so Word documents will become *.docx, Excel workbooks will be *.xlsx, and so on. Most documents in the current formats will end up smaller in the new formats because they are compressed: some documents could end up being up to 75 per cent smaller, but there may not be much reduction in size in, for example, a document that contains a lot of embedded pictures – JPEG, TIFF and GIF images are already highly compressed and attempting to compress them again usually achieves very little. In some circumstances a file could actually become larger in the new format, but that should be a rare occurrence.

The XML formats currently used by Office 2003 are in plain text, which is to say that you can open and edit them in Notepad if you know what you are doing. It also means any software tools able to manipulate XML can be used to work on Office XML files. You can create an Excel Template – complete with headings, formatting and custom XML markup – and save it in XML format, then get a server-based process to open that XML file, add some data to the custom XML schema and save a new copy of the file including that new data. Voilà, you have a new Excel workbook based on your own template, filled with data, and the server process didn’t need to use the Excel automation interface at all – it just needed to know how to deal with XML. Automating what are really interactive applications from a server is never a good idea, because you can often come across unexpected circumstances where the application will display a dialog (such as an error message) and expect some action from the user, and if there is no user to see the message, that process will wait indefinitely. Using an XML-based file format means you do not need to install the Office suite on the server at all, nor to drive the applications through their object model – just extract and add data to Office documents directly using XML tools. However, if the new Office 12 file formats are going to consist of XML inside a ZIP file, such server processes are going to need to be able to unzip and zip the files to get at the XML inside them. Tools to do this are part of the WinFX Beta using the System.IO.Package namespace, but other ZIP tools will work just as well.

Disclaimer: Some pages on this site may include an affiliate link. This does not effect our editorial in any way.

Todays Highlights
How to See Google Search History
how to download photos from google photos