HtmlCleaner 2.8 is out

Its the first release of 2014, and its got a nice patch from Rafael that makes it run a lot faster (who knew that just checking whether a String is a valid Double in the XPath processor would cause so much stress?) and another patch from Chris that makes it output proper XML ID attributes in DOM.

My contributions this time around were more enhancements to “foreign markup” handling, which is important when cleaning up HTML that contains valid SVG or schema.org content. HtmlCleaner wasn’t really written with that sort of use in mind when Vladimir started on it back in 2006, so it involved a fair bit of wrangling, but I think we’re nearly there now.

Its good to see the number of contributions going up – last release we had an entire GUI contributed by Marton – and I think having a faster release schedule is helping with that. Hopefully one day I’ll be able to make releases that just consist of having applied other people’s patches 🙂

HtmlCleaner is HTML parser written in Java. It transforms dirty HTML to well-formed XML following the same rules that the most web-browsers use. Download HtmlCleaner 2.8 here, or you can get it from Maven Central.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s