... until the collector arrives ...

This "blog" is really just a scratchpad of mine. There is not much of general interest here. Most of the content is scribbled down "live" as I discover things I want to remember. I rarely go back to correct mistakes in older entries. You have been warned :)

2010-03-07

Streaming Java XML Pipelines

The following example illustrates how to use the Java XML API to build a streaming XML pipeline that:

  1. reads an input stream (input)
  2. validates it against an XML schema (xsd1)
  3. transforms it using XSLT (xslt1)
  4. validates the result against another schema (xsd2)
  5. applies another XSLT transformation (xslt2)
  6. validates that result against yet another schema (xsd3)
  7. writes the result to a stream (output)

Here is the code:

SchemaFactory schemas = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

Schema schema1 = schemas.newSchema(new StreamSource(xsd1));
Schema schema2 = schemas.newSchema(new StreamSource(xsd2));
Schema schema3 = schemas.newSchema(new StreamSource(xsd3));

ValidatorHandler validator1 = schema1.newValidatorHandler();
ValidatorHandler validator2 = schema2.newValidatorHandler();
ValidatorHandler validator3 = schema3.newValidatorHandler();

SAXTransformerFactory transformers =
    SAXTransformerFactory.class.cast(TransformerFactory.newInstance());

Templates template1 = transformers.newTemplates(new StreamSource(xslt1));
Templates template2 = transformers.newTemplates(new StreamSource(xslt2));

Transformer transformer0 = transformers.newTransformer();
TransformerHandler transformer1 = transformers.newTransformerHandler(template1);
TransformerHandler transformer2 = transformers.newTransformerHandler(template2);
TransformerHandler transformer3 = transformers.newTransformerHandler();

validator1.setContentHandler(transformer1);
transformer1.setResult(new SAXResult(validator2));
validator2.setContentHandler(transformer2);
transformer2.setResult(new SAXResult(validator3));
validator3.setContentHandler(transformer3);
transformer3.setResult(new StreamResult(output));

transformer0.transform(new StreamSource(input), new SAXResult(validator1));

This code does not introduce any intermediate DOM trees, string buffers or temporary files (of its own that is -- no warranties are offered for the parser or the XSD and XSLT processors). The key part of this solution is its use of transformer and validator SAX handlers. Also note the use of so-called identity transformers at the start and end of the pipeline. Strictly speaking, the solution could be shortened slightly by using the validate method on a validator, but I present the code as is to emphasize the transformer's role as the "backbone" of the pipeline.

As far as I can tell, there is no StAX analog to this approach in Java 6. The transformation and validation APIs have not yet been updated to know about StAX input streams. The Javadoc for StAXResult makes a cryptic remark about how Transformer and Validator can accept a Result as input, but I think that this is just awkward wording -- those classes show no evidence of living up to that remark.

Blog Archive