You are here: Home > May 2006 > Thursday 25 > Experimenting with EmbeddedRDF and GRDDL Support

May 25, 2006

Experimenting with EmbeddedRDF and GRDDL Support

Embedded RDF is a method of embedding (a subset of) RDF within XHTML and HTML documents. A simple XSLT transformation can be used to extract the RDF from within the document.

A related and more generalised technology is GRDDL which defines how to associate transformation algorithms (i.e. XSLT stylesheets) with XHTML profiles or microformats so that there's a clear mapping from embedded metadata into RDF.

I've been experimenting with adding support for both of these technologies in the XMLArmyKnife SPARQL query service. This provides a means to directly query RDF embedded in XHTML documents.

The mechanism works as follows. When the service retrieves some remote data it checks the Content-Type of the response. If it's application/xhtml+xml or text/html it applies the following rules; otherwise its business as usual and the content will be parsed as RDF.

If possible the retrieved content is parsed as XML and then inspected to discover the XHTML profiles associated with the content.

If the profiles URIs includes that of Embedded RDF it applies a suitable stylesheet to retrieve some triples.

It also looks for the GRDDL data view profile. If that profile is found, then the processor tries to find any additional transformations associated with the document by its author. This mechanism is defined in the GRDDL Profile for XHTML. Essentially it just looks for all link elements with rel="transformation" attributes. If its finds any, then each are applied in turn.

The end result is a single chunk of RDF which is then made available to the SPARQL query as normal.

The mechanism allows you to write queries such as this:


PREFIX foaf: <http://xmlns.com/foaf/0.1>
SELECT ?blog 
FROM <http://iandavis.com>
WHERE 
{<http://iandavis.com/#ian> foaf:weblog ?blog.}

...which discovers Ian Davis's blogs by querying his homepage.

A similar technique can be applied to directly query Dan Connolly's homepage to discover the dates he's attending the WWW2006 conference.

Dan's homepage is interesting as he's combined Embedded RDF with hCalendar. It nicely illustrates that you can merge together multiple RDF views of the same source page, as well as demonstrating that SPARQL can be applied to microformat content very easily. All thats needed is a suitable XSLT stylesheet. Danny wrote a nice posting looking at Microformats on the GRDDL which makes useful background reading.

The current code needs to be generalised to support arbitrary profile URIs. The GRDDL specification outlines a more general solution to discovering transformations by dereferencing the profile URI. Although that aside the current implementation can deal with any transformation referenced licensed via the "data view" profile. If you include a suitable link then any microformat is already supported.

I'd like to gather some feedback on the initial implementation first though. Let me know if you cook up any cool demonstrations.

The way I've implemented the support is via plugging in some custom code to the Jena FileManager component, so I may consider releasing it as a separate Jena contribution. Mail me if you're interested in that.

Oh, and of course this mechanism isn't restricted to just querying via SELECT. You can extract the data as RDF using CONSTRUCT and DESCRIBE. Would be interesting to plug is into Slug to provide a way to crawl and aggregate microformat content.

« SPARQL Geo Extensions | Main | User Agent »