TagSoup Clean-Up Service
- Introduction
- Base URL
- Request Methods
- Request Parameters
- Response Codes
- Response Format
- Implementation Notes
Introduction
The XAK TagSoup Clean-Up Service provides a simple online service to allow on the fly correction of malformed and dodgy HTML documents found in the wild. This allows them to be processed further, e.g. to extract metadata or apply XSLT transformations.
The service is based on John Cowan's TagSoup Parser.
Base URL
The Base URL of the query service is: http://xmlarmyknife.com/api/xhtml/tagsoup
Request Methods
This service currently only supports the HTTP GET method.
Request Parameters
| Parameter | Notes | Required? | Occurence |
|---|---|---|---|
| html-uri | URL of HTML data to process | Yes | 1 |
TagSoup supports a number of other parameters, these same parameters can be applied to this service. Consult the TagSoup documentation for a complete list of options (see section "TagSoup as a stand-alone program").
Response Codes
-
200-- successful transformation -
400-- missing parameter -
500-- error fixing data or fetching data
Response Format
The service currently returns all responses with a Content-Type of application/xhtml+xml,
unless the html option is specified, in which case the response is served as text/html.
TagSoup also supports responses in PYX format.
These responses are returned as text/plain
Implementation Notes
This service has been implemented using TagSoup 1.0 Release Candidate 6.