Weakly Reachable: DOM getElementById vs HTML Parsing in Java

2007-11-07

DOM getElementById vs HTML Parsing in Java

If you parse an HTML document using JTidy, the DOM instance that it returns only supports DOM level 1. The DOM level 2 method getElementById is "supported", but it unconditionally returns null.

Using htmlcleaner 1.55 and Java 1.5, a native Java DOM object is returned. It supports getElementById, but it will not recognize the id elements defined by HTML. Thus, it also unconditionally returns null. Java 1.5 uses Xerces internally, so it is probably possible to use some of the Xerces HTML parsing capabilities, but this would be an unsupported solution.

Weakly Reachable

2007-11-07

DOM getElementById vs HTML Parsing in Java

Blog Archive