In this post we aim to transform a text file into a hierarchical XML document. As shown in Listing 12.11, the text file will contain the following book information: the ISBN, Title, Author(s), Publisher, Publication Date, and Price.
Listing 12.11 CSV of Books0735621632,CLR via C#,Jeffrey Richter,Microsoft Press,02-22-2006,59.99
0321127420,Patterns Of Enterprise Application Architecture,Martin Fowler,Addison-Wesley Professional,11-05-2002,54.99
0321200683,Enterprise Integration Patterns,Gregor Hohpe,Addison-Wesley 04 Professional,10-10-2003,54.99
0321125215,Domain-Driven Design,Eric Evans,Addison-Wesley Professional,08-22-2003,54.99
1932394613,Ajax In Action,Dave Krane;Eric Pascarello;Darren James,Manning Publications,10-01-2005,44.95
Our goal is to parse the data in the text file and produce a hierarchy of XML as shown below:
Listing 12.12 XML Output<?xml version="1.0" encoding="utf-8" ?>
<books>
<book>
<title>CLR via C#</title>
<authors>
<author>
<firstName>Jeffrey</firstName>
<lastName>Richter</lastName>
</author>
</authors>
<publisher>Microsoft Press</publisher>
<publicationDate>02-22-2006</publicationDate>
<price>59.99</price>
<isbn>0735621632</isbn>
</book>
<book>
<title>Patterns Of Enterprise Application Architecture</title>
<authors>
<author>
<firstName>Martin</firstName>
<lastName>Fowler</lastName>
</author>
</authors>
<publisher>Addison-Wesley Professional</publisher>
<publicationDate>11-05-2002</publicationDate>
<price>54.99</price>
<isbn>0321127420</isbn>
</book>
…
</books>
The XML is constructed in a bottom up manner with functional construction, and query expressions that select the relevant data out of the individual lines of the file are intertwined to produce the desired XML.
In order to create our desired XML we’ll need to open the text file, split each line in the file into an array, and place each item in the array into the appropriate XML element. Let’s start with opening the file and splitting it into parts.
from line in File.ReadAllLines("books.txt")
let items = line.Split(',')
// add functional construction statements for creating the XML
We leverage the static ReadAllLines method available on the File class to read each line within the text file. Since ReadAllLines returns a string array we can safely use it in our from clause. To split each line we make use of the Split method available on string, as well as the let clause that is available in C#. The let clause allows us to perform the split operation once and refer to the result in subsequent expressions. Once we have our line split apart we can wrap each item into the appropriate XML element.
var booksXml = new XElement("books",
from line in File.ReadAllLines("books.txt")
let items = line.Split(',')
select new XElement("book",
new XElement("title", items[1]),
new XElement("publisher", items[3]),
new XElement("publicationDate", items[4]),
new XElement("price", items[5]),
new XElement("isbn", items[0])
);
We conveniently left the authors out of the above query since they require a little extra work. Unlike the other fields in our text file, there can be more than one author specified for a single book. If we go back and review the sample text file, we see that the authors are delimited by a semicolon (“;”).
Dave Krane;Eric Pascarello;Darren James
As we did with the entire line, we can Split the string of authors into an array, with each author being an individual element in the array. To be sure we get our fill of Split, we make use of it one final time to break the full author name into first and last name parts. Finally, we place the statements for parsing out the authors into a query, and wrap the results of our many splits into the appropriate XML.
…
new XElement("authors",
from authorFullName in items[2].Split(';')
let authorNameParts = authorFullName.Split(' ')
select new XElement("author",
new XElement("firstName", authorNameParts[0]),
new XElement("lastName", authorNameParts[1])
)
)
…
When we add it all together we get the final solution, which can be seen in Listing 12.13.
Listing 12.13 Final Implementationusing System;
using System.Query;
using System.Xml.XLinq;
using System.IO;
namespace LinqToXmlSamples.FlatFileToXml {
class Program {
static void Main(string[] args) {
XElement xml =
new XElement("books",
from line in File.ReadAllLines("books.txt")
where !line.StartsWith("#")
let items = line.Split(',')
select new XElement("book",
new XElement("title", items[1]),
new XElement("authors",
from authorFullName in items[2].Split(';')
let authorNameParts = authorFullName.Split(' ')
select new XElement("author",
new XElement("firstName", authorNameParts [0]),
new XElement("lastName", authorNameParts [1])
)
),
new XElement("publisher", items[3]),
new XElement("publicationDate", items[4]),
new XElement("price", items[5]),
new XElement("isbn", items[0])
)
);
Console.WriteLine(xml);
}
}
}
As we’ve seen over and over again, Linq to XML allows us to mix and match data from varying data sources into functional construction statements. The result is a very consistent programming API for developers, which makes the way XML is created from other data sources – whether they be relational, object, or a text file – consistent and predictable.
Tags:
xlinq,
linq to xml,
linq
When working with XML data we inevitably have to concern ourselves with XML namespaces + XML namespace prefixes. The Linq to XML API has been designed to make dealing with namespaces and namespaces prefixes as direct and straightforward as possible. Rather than having to deal with XmlNamespaceManagers and the like we simple reference all of our elements and attributes using their fully expanded name (namespace + local name).
While the simplification provided by Linq to XML makes dealing with namespaces slightly more straightforward it doesn't go as far as I think it needs to. We still need to remember to include our namespaces in every query we perform. When working with XML trees that only contain one default namespace I'd like something simpler. Enter the XNamespaceScope class. The XNamespaceScope class would be used similar to how we make use of the TransactionScope class for managing transactions. When we're about to work with an XML tree that only contains one namespace which we're interested in we can new up a XNamespaceScope class, place it inside a using block that surrounds our query expressions, and have Linq to XML use the XNamespace that's passed to the XNamespaceScope in all queries within the block. So rather than this code where we have to repeatedly include our namespace (ns)
1 XNamespace ns = "http://webservices.amazon.com/AWSECommerceService/2005-10-05";
2 var booksToImport =
3 from amazonItem in amazonXml.Descendants(ns + "Item")
4 let attributes = amazonItem.Element(ns + "ItemAttributes")
5 select new Book {
6 Isbn=(string) attributes.Element(ns + "ISBN"),
7 Title=(string) attributes.Element(ns + "Title"),
8 PubDate=(DateTime) attributes.Element(ns + "PublicationDate"),
9 Price=ParsePrice(attributes.Element(ns + "ListPrice")),
10 BookAuthors=GetAuthors(attributes.Elements(ns + "Author"))
11 };
We instead do:
1 XNamespace ns = "http://webservices.amazon.com/AWSECommerceService/2005-10-05";
2 using(new XNamespaceScope(ns)) {
3 var booksToImport =
4 from amazonItem in amazonXml.Descendants("Item")
5 let attributes = amazonItem.Element("ItemAttributes")
6 select new Book {
7 Isbn=(string) attributes.Element("ISBN"),
8 Title=(string) attributes.Element("Title"),
9 PubDate=(DateTime) attributes.Element("PublicationDate"),
10 Price=ParsePrice(attributes.Element("ListPrice")),
11 BookAuthors=GetAuthors(attributes.Elements("Author"))
12 };
13 }
Thoughts?
tags: linq, xlinq, linqtoxml