Friday, 14 August 2015

Parsing XML: XPath with JDOM2

About parsing XML, Horstmann & Cornell write: “To process an XML document, you need to parse it. A parser is a program that reads a file, confirms that the file has the correct format, breaks it up into the constituent elements, and lets a programmer access those elements. The Java library supplies two kind of XML parsers:
  • Tree parsers, such as the Document Object Model (DOM) parser, that read an XML document into a tree structure.
  • Streaming parsers, such as the Simple API for XML (SAX) parser, that generate events as they read an XML document.
The DOM parser is easier to use for most purposes.”[1]

What is JDOM2? As given in Studyrails : “JDOM is an in-memory XML model that can be used to read, write, create and modify XML Documents. JDOM is similar to DOM in that they both provide an in-memory XML document model, but while DOM is designed to work the same in multiple languages (C, C++, ECMAScript, Java, JScript, Lingo, PHP, PLSQL, and Python), JDOM is designed only for Java and uses the natural Java-specific features that the DOM model avoids. For this reason JDOM intentionally does not follow the w3c DOM standard. JDOM versions since JDOM 2.0.0 (JDOM2) all use the native language features of Java6 and later like Generics, Enums, var-args, co-variant return types, etc.”[2]

Compared to DOM, you can do XML processing more elegantly with fewer lines of code using JDOM2.[3]

XPath, one of those recursive acronyms that I like, stands for XML Path Language. As given in Wikipedia, it is a “query language for selecting nodes from an XML document.”[4]

Horstmann & Cornell explain why you should choose XPath: “If you want to locate a specific piece of information in an XML document, it can be a bit of a hassle to navigate the nodes of the DOM tree. The XPath language makes it simple to access tree nodes.”[1]

So there we are. In Java, if we want to parse an XML file and use the data, our choice of tools is JDOM2 and XPath. Let’s see an example of how to do it. Suppose we have the file “customer.xml” with the following data:

We want to fetch first name, last name and email of the customer and populate a Customer object called customer.

You would instantiate objects of File, SAXBuilder, Document, XPathExpression, XPathFactory, and Element. Navigation to the XML elements is like a file structure string “/home/mahboob/documents/bank-project…”, which you pass to XPathFactory’s compile method and it returns a XPathExpression. Then you just have to evaluate the XPathExpression to extract the element(s) as required using methods like getAttributeValue, or getValue. The code given below makes it very clear:

See, done so easily and elegantly!

If you want to explore more, another good example is available at the following URL:

[1] Core JAVA, Volume II, 9th Edition, 2014. Pearson Education.

No comments:

Post a Comment