Redland

Dave Beckett

 
 
Hosted by
Dreamhost

since 2005.

Data

DOAP
(See DOAP Project)

Redland librdf RDF API Library - Contexts

Introduction

Redland 0.9.12 introduced a major new feature called redland contexts for supporting various management features of using triples with an RDF graph. This document describes redland contexts, how they came about, what they are and their intended or possible uses.

(See history for the original use case)

RDF and tracking triples

RDF graphs are a set of triples (subject, predicate object) and it is common to want to merge graphs from multiple sources by unioning the graphs/sets. Where there are duplicates triples in some sets, the triple's source is lost in the merge. The RDF graph doesn't preserve this source information, so it needs to be provided by applications. It also does not provide scoping for sub graphs or ways to quote triples (in the graph but not part of the conjunction of triples semantics). RDF does have reification but it seems more like a syntax abbreviation than a good way to model things.

Add a 4th item, 5th, 6th to the triples

Changing the RDF model or graph to break the triples model into, say, quads was a non-goal. This would mean ripping up all the existing software and designing a new non-RDF model that was based about 4-ary items and also losing all interoperability with triples. Other proposals have added more items such as graph identity, some sort of ordering. These remain untested or research ideas rather than anything compatible with RDF.

Contexts - but whose?

There have previously been proposals to make RDF triples have or be part of contexts - a general term which has been used in different ways (or contexts, ahem, sorry). In Notation 3[5], Berners-Lee defines an N3-context (as far as I understand it) relates two N3-formulae in a triple like this:

{s1} foo:prop {s2}

where each of s1 and s2 are some set of triples - i.e. an RDF graph. These are nested graphs so the triples in s1 and s2 are not in the outer graph that contains the triple above.

There are also contexts used in the sense of R.V. Guha[1], who was a major part of creating RDF after his MCF work at Apple/Netscape (and who left them out or didn't get them into RDF in 1998, so I guess he had good reasons!):

So I certainly wasn't going to enter into something with that theoretical depth to solve some smaller goals of tracking triples from merges (and others, see the examples below).

Redland Contexts

Don't break RDF - no quads

Given that adding new items the triples was a bad idea breaking the RDF graph model and adding subgraphs/quoting or Guha-contexts was also going to break things, be hard or tricky to make consistent, a simpler approach was needed.

When adding triples to a graph from multiple sources, it would be easy to allow an identifier to be given at that time so the method would be graph.add(triple, identifier). If this could somehow be associated with that particular addition of the triple, it might be possible to use that later. This identifier, which is an redland node (URI or literal or blank node) is called the context node in Redland contexts.

Similarly, it would also be natural to have graph.addTriples(tripleGenerator, contextNode) when adding a set of triples generated by something such as parsing RDF/XML, a query on a graph etc. In Redland, a sequence of triples is represented by a Redland stream (for 0.9.11 and earlier the only method was get() which returned the next triple and moved the generator forward).

Assuming that the Redland triplestore implementations somehow recorded this triple<>context node details, how would this best be used?

Redland fundamentals

Redland triples (statements in the API) are just 3 redland nodes (URIs or literals or blank nodes) and are independent of any graph or store. Redland Nodes similarly are not attached to anything other object. It is in particular, very common and naturla to use the same nodes and statements across multiple graphs so tying them to one would have been hard to implement or rather meaningless. This means that adding to them with a model/storage specific context would have been a mistake and counter this original design.

Redland graphs and stores were always bag of triples (before it was clearly defined that RDF graphs were a set) so a query of the form: model.find_statement(s, p, o ) would return a stream of answers triples (s,p,o), (s,p,o), ... This proved useful when contexts appeared since the multiple triples returned represented different statings of the triples, potentially with different context nodes.

Contexts with querying

The context information is most useful when queries are done on the graph so that the user is dealing with a particlar triple in a query over some graph. It is at that point that the context (in the graph) of that triple or node becomes important. This is represented in RDF when either a model.find_statement(s, p, o), model.get_target(s, p) or some other API call is done that returns a sequence of answers (triples or nodes). Sequences of statements are streams, sequences of nodes are redland iterators.

As previously mentioned, the 0.9.11 and earlier API for streams (and iterators) was that once constructed, the user could run a stream.get() method to return the next result and move the sequence on. To return the context of the current statement, this get-and-move sequence had to be modified into a separate stream.current() and stream.next(). Context nodes would be returned by a new stream.context() method.

In Redland 0.9.12 this changed was implemented.

C example for Redland 0.9.11 and earlier:

  while !stream.end()
    statement=stream.next()
    /* do something with the statement */

and in Redland 0.9.12:

  while !stream.end()
    statement=stream.current()
    /* do something with the statement */
    stream.next()

The new method get_context was added returning a Node for the context of the Statement.

C Stream context example:

  while !stream.end()
    statement=stream.current()
    context_node=stream.context()
    /* do something with the statement and its context node */
    stream.next()

This was replicated for the iterator class returning sequences of nodes in response to a query on a model.

So the key point is: the stream/iterator instance, made from a particular model, holds the extra information. Triples (statements) and Nodes remain as defined in RDF's abstract syntax

Other API calls that were useful and added were:

model.context_add_statement(statement, context)
model.context_remove_statement(statement, context)

# Remove all statements from the model in a given context
model.remove_context(context)

# List all statements in the model in a given context
stream=model.context_serialize(context)

Using Redland Contexts

This feature can be used for a variety of things depending on how the context nodes are used with the triples. The following is not an exhaustive list:

Enable true graph merging / updating / demerging
Identify the subgraphs (sets of triples from particular sources) with context nodes.
Statement Identity
Add each triple with a different context node. RDF's model does not assign identity to triples. There is reification also which might be used with this approach.
Statement Provenance
Use the context node as the subject of other statements about the statement that is returned.
Subgraphs
Similar to the merging approach but consider the RDF graph to be a set of graphs and manipulate them as such. (Aside: Redland is gaining model aggregation to do this explicitly).

As of now (August 2003), as far as I understand it, most people are using this in Redland for the graph merging/demerging case such as in FOAFBot and the MINDswap website.

Implementation

Just a matter of months of coding and debugging as the internals of iterator and stream were replaced and the all the model and stores interfaces and implementations were changed to store this information.

more later - FIXME

Problems

The method model.contains_statement(statement) fails in the current implementation; since it will not match a particular statement but only a stating of such a statement in a context. This method will need to be reimplemented, but since it isn't used much (I think), it would be ok for it to be less efficient.

History - what Edd originally wanted

Edd Dumbill was using Redland for FOAFBot (as described in Support online communities with FOAF[3]) and he wanted to:

  • annotate each statement added to the model with a node, the context node
  • get back the context node associated with any result obtained from any of the model query methods
  • be able to get an iterator over statements matching a particular context
  • be able to selectively remove a statement in a particular context
  • would be nice: null context for statements with no context node

Edd reports on his use of this feature in the updated FOAFBot made with Redland and python in Tracking provenance of RDF data[2] and Finding friends with XML and RDF[4].

References

[1]
Contexts: A Formalization and Some Applications, R.V. Guha, PhD thesis, 1995, Stanford University (PostScript, 146pp).
[2]
Tracking provenance of RDF data, Edd Dumbill, IBM developerWorks
[3]
Support online communities with FOAF, Edd Dumbill, IBM developerWorks
[4]
Finding friends with XML and RDF, Edd Dumbill, IBM developerWorks
[5]
Notation 3 -- Ideas about Web architecture, Tim Berners-Lee, 1998-2001

Last Modified: $Date: 2004/01/04 21:21:16 $