Parsing WordML using XLinq

Over the last couple of weeks I've been reading a lot of the documentation that Eric White put together for XLinq.  Eric is a "Programmer Writer" for several Xml technologies which makes him responsible for a good bit of the documentation on XLinq. 

Yesterday Eric posted a nice writeup of how he managed to parse WordML using XLinq.  While Eric's writeup shows a lot of the nice features within XLinq I'd like to hightlight a couple.  The obvious first feature to talk about is how nice the querying capabilities are for Xml when using XLinq.  I think we've all come to know and love the querying that Linq provides so lets move onto some other nice features of XLinq as highlighted in Eric's post.

Let's take a look at the query used to retrieve all the annotations in the Word document.  Notice that rather then having to deal with XmlConvert we can simply cast our attribute to a string.  Explicit cast operators have been defined for the classes that you'll make use of most when working with XLinq (XAttribute, XElement).  They allow programmers to work with data within an Xml fragment in a very familiar manner.  Rather then having to work with the XmlConvert class everywhere that we read content out of an attribute or element we can simple cast the value to the proper type and let XLinq handle all the dirty work.

            var commentNodes =
                from annos in wordDoc.Descendants(aml +"annotation")
                where (string)annos.Attribute(w + "type") == "Word.Comment"
                select annos;

Another nicety to point out is how easy it is to deal with the namespaces that are used in WordML.  If we were using existing Xml API's we'd have to new up a namespace manager, setup our namespaces and then make sure everything that "queries" our document uses the mentioned namespace manager.  Not a huge deal, but more painful then necessary.  Within XLinq we can forget about namespace managers since all "Xml Names" within XLinq are fully qualified.  This means that whenever we're querying our document we need to give the full namespace and local name of the element or attribute which we're interested in. 

In Eric's sample code we can see how we deal with namespaces by looking at how their defined and then how they're used when querying the document.  Lets start with a look at how the namespaces are defined:


            XNamespace aml = "http://schemas.microsoft.com/aml/2001/core";
            XNamespace w = "http://schemas.microsoft.com/office/word/2003/wordml";

As we can see an XNamespace consists of an Xml Namespace URI.  XLinq has implicit conversion operators for the XNamespace and XName classes which allows strings to be automagically converted to  the classes, very nice. After the namespaces have been defined its simply a matter of prefixing our local names (XName) with the namespace when issuing our queries.

            var commentNodes =
                from annos in wordDoc.Descendants(aml + "annotation")
                where (string)annos.Attribute(w + "type") == "Word.Comment"
                select annos;

Several features discussed in this post are minor API usability features, but, when they're all taken as a whole they result in a much more usable and productive environment for working with Xml.



# re: Parsing WordML using XLinq

Wednesday, August 02, 2006 6:30 PM by Sam Gentile    
Nice post. So can we just have this now? -)

Post a Comment

 
 
Prove you're not a spammer: 
5 + 5 =