May 25, 2006
Keep On Truckin'
Reading Danny's post earlier reminded me that I'd been meaning to do an online version of John Cowan's TagSoup parser.
I've often found the W3C HTML Tidy service to be quite useful for fixing up dodgy HTML to scrape data from it using XSLT. However sometimes Tidy isn't quite able to handle all the bizarre HTML variants thrown at it, but TagSoup's "Keep On Truckin'" philosophy seem to let it deal with a wider array of problems. (This is anecdotal evidence, I've not done a true comparison of the tools).
Anyway, I've packaged up TagSoup so you can try if for yourself. Here's the documentation. The service supports all the same parameters as the TagSoup command-line.
It even supports Pyx!. Here's Danny in line-mode.
