One final important aspect of XML regards formatting and document transformation. An XML Style Sheet enables you to transform an XML description into a new document, for example an HTML document. HTML employs predefined tags whose meaning is well understood by all web page authors and browsers: for example, the <table> element starts a new table, which the user’s browser knows how to display. You can easily add styles to an HTML document by supplying a Cascading Style Sheet. However, XML doesn’t employ predefined tags and you’re free to choose any tag names you like A <table> element in XML might mean an HTML table or a piece of furniture: it’s the XML Style Sheet that describes how each tagged element should be displayed.
XML Style sheets can also be used to transform one XML document into another, possibly breaking down some elements or moving them around, and such manipulations aren’t uncommon in systems that exchange XML documents.
Processing XML in code
An XML document looks like a tree with a single root element and multiple child elements. You can process such a tree either on a node-by-node basis or by pulling the entire tree into memory and processing its various elements. Java developers will no doubt be familiar with the Simple API for XML (SAX), which supports node-by-node parsing. The whole-tree approach is known as the Document Object Model (DOM), and is flexible because it lets you navigate any element in the tree as needed. On the other hand, SAX is faster because it doesn’t need to load the entire document into memory, looking at just a single node at a time and reading only forwards. Both approaches work, but .NET doesn’t implement SAX directly.
I’m going to use PowerShell to illustrate the manipulation of XML documents, for a couple of reasons. First, PowerShell provides a great set of tools for exploring XML documents, since it utilises the .NET Framework’s XML capabilities directly, including all of the System.XML namespace; second, with XML becoming so important to IT professionals, you may find some of the techniques I describe useful when managing XML documents. A word of warning, though: it may take a bit of time to get comfortable with XML processing within PowerShell. It took me a while, but it quickly became obvious how powerful PowerShell can be in processing XML documents.
Open up a PowerShell window now and try out some of these commands (I’ll put the XML documents I use in the column onto my website at www.reskit.net/pcpro). To open up an XML document in PowerShell, issue the following command:
$doc= [xml] (cat book.xml)
This is a neat bit of shorthand that in effect opens the file and creates an XML document based on any text in the file. For those more used to C#, the following PowerShell syntax would also work and might be more familiar to you:
$doc= New-Object xml
$doc.load(“book.xml”)
In this case, I first create a new XML document object, and only then use a built-in method to load the file. In both cases, PowerShell creates a variable called $doc, which is an XML document, and you can access $doc to get at the information contained in the XML document. Opening the document this way uses the DOM, and PowerShell creates an in-memory representation of the document you can then manipulate.
To see the details of all the properties and methods implemented by the $doc object, you can use PowerShell’s Get-Member cmdlet, as follows:
Disclaimer: Some pages on this site may include an affiliate link. This does not effect our editorial in any way.