How are things going with HtmlCleaner?

HtmlCleaner is the FOSS project I’ve been maintaining since 2013. So, how is it going so far?

Downloads and users

I can get download stats from Sourceforge, which hosts the binaries, but perhaps a better perspective is the number of times the project is being downloaded via Maven Central as a component of other projects – Sonatype handily provide these statistics.

Last month (June 2015), there were 1,184 downloads from Sourceforge. There were also 10,468 downloads from Maven Central. Its averaged around that number per month over the year. Sourceforge downloads were at their peak in 2014 when they hit 1500 a month; but have been stable at 1000 or so a month since.

How meaningful these statistics are I’m not sure; they do seem to show a pretty stable level of interest in HtmlCleaner, which is encouraging. Also, a total of 130,000 downloads a year seems like a lot – there must be quite a few users out there!


Although we haven’t added more committers to the project (something I’m keen to do) we have had a lot more patches over the past year being submitted by users and included in releases. Most recent releases have included at least one user-submitted patch.

(My general philosophy on patches is that, if they work and are well tested, they go in – I don’t have any sort of ideological preferences for how code is written or whether I think a feature is necessary; if a user wants something to the extent they create a patch for it, its a valid feature request by definition.)

We have had lots of users submitting bugs and questions too, which is a great sign. (No, really, I like seeing bug reports! Only software with no users has no visible bugs…)


Release frequency has been a bit patchy, something thats entirely my fault as all kinds of other priorities get in the way. We’ve had 3 releases so far in 2015, 3 in 2014, and 6 in 2013. Still before that there was only one in 2010 and two in 2008 so we’re still doing well!

The Code

We added some new features this year, finally updating to full Html 5 tag set support, and adding some much nicer command-line operations. However, I think we’re getting close to the time that I need to strip down and rebuild the engine for a 3.0 version as we’re coming up against the limits of tweaking the existing engine.

If cleaning up shoddy HTML is something that interests you, pop along to HtmlCleaner and help out!

