2010年9月23日 星期四

StAX - Another XML Programming Library

     XML is now widely used to exchange data over Internet. Many emerging new technologies are also based on XML. Therefore, how to parse an XML file, to manipulate contained XML data and even to generate an XML file seems to be a very fundamental task of many applications.
    There are a variety of existing APIs to XML processing. To be more specific, these APIs can be categorized into four classes: (http://en.wikipedia.org/wiki/Xml)
    1. Declarative transformation languages such as XSLT and XQuery;
    2. XML data binding, which provides an automated translation between an XML document and programming-language objects, and this may sound exciting to object-oriented programmers, for instance JAXB in Java;
    3. Tree-traversal APIs accessible from a programming language, for example DOM and XOM;
    4. Stream-oriented APIs accessible from a programming language, for example SAX and XNI.

    Here, I am going to talk about StAX, the streaming API for XML, which is a standard XML processing API that allows you to stream XML data from and to your application. (http://stax.codehaus.org/Home). As the StAX itself indicates, StAX belongs to the stream-oriented APIs' category. But actually it differs from previous APIs, like SAX or XNI.
    The major difference is about the patterns they adopt. For previous APIs, like SAX, they employ a push pattern, in which they pass the content of XML documents to applications as soon as they see it, regardless of applications' readiness for that data. On the other hand, StAX adopts the pull pattern, in which applications ask the StAX parser to pass data actively, not fed data passively. In another words, in a pull API the client program drives the parser, whereas in a push API the parser drives the client. (http://www.xml.com/pub/a/2003/09/17/stax.html?page=1)
    Another big difference is that StAX is a bidirectional API, which means that StAX can not only read XML documents, but also create XML documents. The situation of SAX is that it doesn't support writing data to XML files.
    In general, since StAX is stream-oriented API, compared to DOM or other tree-traversal APIs, it has the abilities of fast XML processing, less memory comsuming and so forth. This features seem to be more valuable when XML documents are larger than a few megabytes, and will be a very good option in the ubiquitous computing environment because devices in the environment are constrained.

    Before I do a demo, I just want to point out that StAX is a pure Java API and is parser independent, and it is standardized as JSR-173 specification. What's more, StAX is included in JDK 6.0 (some may prefer JDK 1.6), so using StAX is becoming natural in JDK 1.6. But for those developers who are using JDK 5.0 or below, they can just download a jar file at http://dist.codehaus.org/stax/jars/ and use it. Following I will show a demo to illustrate that how convenient it is to use StAX in the more familiar iterator design pattern rather than the less well-known observer design pattern (like SAX does), and I will just use JDK 6.0 to simplify the configuration and to show JDK's natural StAX ability.

Parsing documents with StAX
    The XML documents I am going to use is called weather.xml, its content is as follows:


<?xml version="1.0" encoding="UTF-8"?>
<WeatherReport date="2007-08-12">
<City name="Hong Kong">
<Report time="09:00:00">
<Weather>Cloudy</Weather>
<Temp unit="C">34</Temp>
</Report>
<Report time="21:00:00">
<Weather>Thunder</Weather>
<Temp unit="C">26</Temp>
</Report>
</City>
<City name="Macao">
<Report time="09:00:00">
<Weather>Cloudy</Weather>
<Temp unit="C">31</Temp>
</Report>
</City>
<City name="Beijing">
<Report time="09:00:00">
<Weather>Sunny</Weather>
<Temp unit="C">34</Temp>
</Report>
</City>
</WeatherReport>

    XMLStreamReader is the key interface to read documents in StAX. This interface represents a cursor that's moved across an XML document from beginning to end. At any given time, this cursor points at one thing: a text node, a start-tag, a comment, the beginning of the document, etc. The cursor always moves forward, never backward, and normally only moves one item at a time. You invoke methods such as getName and getText on the XMLStreamReader to retrieve information about the item the cursor is currently positioned at. (http://www.xml.com/pub/a/2003/09/17/stax.html?page=1) My code to display the contained information of weather.xml is pasted below: 

public class StAXDemo {
public static void main(String args[]) {
XMLInputFactory factory = XMLInputFactory.newInstance();
try {
XMLStreamReader parser = factory.createXMLStreamReader(new FileInputStream("src/weather.xml"));
for (int event = parser.next(); event != XMLStreamConstants.END_DOCUMENT; event = parser.next()) {
switch (event) {
         case XMLStreamConstants.START_ELEMENT:
         if(parser.getLocalName().equals("City")){
         System.out.println(parser.getAttributeValue("", "name") + ":");
             }else if(parser.getLocalName().equals("Report")) {
             System.out.println("\t" + parser.getAttributeValue("", "time") + ":");
             }
         break;
         case XMLStreamConstants.CHARACTERS:
         if(!parser.getText().trim().equals(""))
         System.out.println("\t\t" + parser.getText().trim());
         break;
}
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (XMLStreamException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
The execution result of the code is: 

Hong Kong:
09:00:00:
Cloudy
34
21:00:00:
Thunder
26
Macao:
09:00:00:
Cloudy
31
Beijing:
09:00:00:
Sunny
34

    From the statement of for loop, it is obvious that StAX is not event-driven like SAX does, and somehow it becomes easier and more plain. Also, I think that the for loop statement gives us a sense of pull pattern, in which my code asks for relative data, not be fed passively.

Write XML documents with StAX
    Creating XML documents with StAX is also very easy. We have XMLStreamWriter instead of XMLStreamReader, and it provides us a variety of methods to construct elements, attributes, text and all other parts of an XML  document. Here is my code to write a simple XML document:

public class StAXDemo2 {
public static void main(String args[]) {
try {
OutputStream out = new FileOutputStream("data.xml");
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = factory.createXMLStreamWriter(out);
writer.writeStartDocument("UTF-8", "1.0");
writer.writeStartElement("WeatherReport");
writer.writeAttribute("date", "2007-08-12");
writer.writeStartElement("City");
writer.writeAttribute("name", "Hong Kong");
writer.writeStartElement("Report");
writer.writeAttribute("time", "09:00:00");
writer.writeStartElement("Weather");
writer.writeCharacters("Cloudy");
writer.writeEndElement();
writer.writeStartElement("Temp");
writer.writeAttribute("unit", "C");
writer.writeCharacters("34");
writer.writeEndElement();
writer.writeEndDocument();
writer.flush();
writer.close();
out.close();
}catch (IOException ioe) {
ioe.printStackTrace();
}catch (XMLStreamException xse) {
xse.printStackTrace();
}
}
}
The generated file is as follows:

<?xml version="1.0" encoding="UTF-8"?>
<WeatherReport date="2007-08-12">
    <City name="Hong Kong">
        <Report time="09:00:00">
            <Weather>Cloudy</Weather>
            <Temp unit="C">34</Temp>
        </Report>
    </City>
</WeatherReport>

    Through this article, I just want to tell that there is another good option to process XML, which is stream-oriented and works as a pull pattern. So let StAX be a handy toolkit to Java developers, and make XML processing no longer a big issue! 

沒有留言:

張貼留言