Why don't you open source your algorithm and more folks can work on it with you. I've been futzing with Readability JS converted to PHP (but could port to Ruby, Python) and it would be great to collab and share test files, etc.
I'd be interested in working on this project --- it's a problem I've come across quite a bit. There's even an academic contest for it, called CLEANEVAL, although the way they set up the problem was arguably not quite right.