XML through PowerShell
XML is fast becoming a must-know technology for IT professionals: under Windows Vista and Longhorn server, it’s a core technology, used for event log entries, unattended setup files and group policy settings. 2007 is the year you should learn more about XML. In this month’s column, I’ll look at what XML is and how to use it, and instead of the usual C# code I’ll be illustrating using PowerShell, since you can access .NET directly from within it. Its built-in Reflection capability (via the Get-Member cmdlet) lets you use PowerShell to explore the XML capabilities of .NET.
XML is a mark-up language for encoding information and services that both computers and humans can understand. It enables you to create documents that contain structured information, and it looks superficially similar to HTML, since both derive from the older Standard Generalized Mark-up Language (SGML). An XML document consists of some textual data, together with mark-up tags that describe the structure of that data: each tag identifies some data item with a name, followed by the data itself. For example, an XML document containing information about a book might look like this:
<Title>Windows Server 2003 TCP/IP Protocols and Services</Title>
As you can see, the basic syntax of XML is straightforward: information about a specific book is delineated by a series of tags that define the contents and structure of the document. As in HTML, tags are enclosed in angle brackets and generally come in pairs of opening tag followed by a closing tag formed by prefixing / to the tag name. XML tags also work roughly the same way as they do in HTML. In the example, the title of the book is an element enclosed by an opening <Title> tag and a closing </Title> tag. XML tag names are case sensitive, so the closing tag must be </Title> not </title> (which would leave the tag still open). Tags can also stand on their own, as with that <InPrint/>, denoted by the name being followed by a /. Standalone tags are used as flags, in this case to denote the book is still in print: it could have been written instead as <InPrint>Yes</InPrint>.
Tags can be nested to indicate deeper structure, as where the <Authors> tag holds two <Author> tags, a structure that expresses the fact that a book can have more than one author. Tags may also contain attributes, as illustrated in the <Price> tag, whose attribute “currency” is given the value “sterling”.
XML documents are composed of a series of one or more document elements, where each element may possess zero, one, or more child elements. A well-formed document must have just a single root element – in the example, this is the <Book> element. But there’s far more to XML syntax than I have space for here, and for a set of good tutorials check out www.w3schools.com/default.asp.
So much for XML syntax, but what do the individual elements mean? For example, what does <Price> represent in the book document? Is it an amount of currency and, if so, which one? Or is it some internal index number? An XML schema is where you describe the meanings to be attached to the contents of an XML document. The schema is another XML document, which specifies what elements are allowed to appear in the document, the order of those elements, their numbers of child elements, and so on. For my book example, I could create a schema that defines what a book document must look like (it has a root element called <Book> and so on) and what individual elements represent (<Price> is the selling price in pounds sterling). Using a strict XML schema to validate the document would have made that use of the currency attribute unnecessary.