... until the collector arrives ...

This "blog" is really just a scratchpad of mine. There is not much of general interest here. Most of the content is scribbled down "live" as I discover things I want to remember. I rarely go back to correct mistakes in older entries. You have been warned :)


DOM getElementById vs HTML Parsing in Java

If you parse an HTML document using JTidy, the DOM instance that it returns only supports DOM level 1.  The DOM level 2 method  getElementById is "supported", but it unconditionally returns null.

Using htmlcleaner 1.55 and Java 1.5, a native Java DOM object is returned.  It supports getElementById, but it will not recognize the id elements defined by HTML.  Thus, it also unconditionally returns null.  Java 1.5 uses Xerces internally, so it is probably possible to use some of the Xerces HTML parsing capabilities, but this would be an unsupported solution.

Blog Archive