State of Redland 2007-02

Redland was born 2000-08. Happy 6.5th birthday!

This is a review of the last approximately 15 months since I moved to the USA in Oct 2005 to work for Yahoo! Media Group in Sunnyvale, California. It covers:

(This is on the web at http://librdf.org/2007/02/18-state/)

1. Redland Users

Redland is made available by several Linux, Unix and other open source projects such as:

and the libraries are also used inside other applications and services such as, for example:

2. State of the libraries

My summary of the high-level state of the libraries is:

Raptor syntax parsing and serializing: libraptor
Very mature. The API is changing rarely, mostly bug fixes or adding new features to existing parsers/serializers or adding entirely new ones.
Rasqal query parsing, executing: librasqal
Under development. The API is changing with each release as it is both not complete and the SPARQL query engine implementation is not fully functional
Redland RDF API and triple stores: librdf
Mature. Some API change is happening to add new features especially for query and storage.
Binding languages
A mixture of mature bindings such as Perl, Python which are well tested, working and complete and immature ones with little testing or known incomplete, such as Tcl and Java.
I feel it is too large for one person to maintain who has all the N-language skills unless that person is me and I do nothing else!

3. Releases

For each of the libraries, the period above has seen the following releases with major changes:

Raptor 1.4.8 - 1.4.14 (7 releases)
  • A new user tutorial covering the entire API was written.
  • A new RSS tag soup parser was added
  • New Atom 1.0, RSS 1.0, Turtle and DOT serializers were added.
Rasqal 0.9.11 - 0.9.13 (3 releases)
  • Updated the SPARQL syntax support to match the November 2005 and April 2006 W3C Working Drafts.
  • Can now serialize query results to JSON.
  • Added APIs to manager query results serializing
  • The query engine had it's ordering, distinct and limit support fixed.
  • Lots of internal query engine changes, in particular to split the query parsing ('prepare') and the query execution ('execute'). These were too intertwined in earlier versions. So now you can nearly execute the same query multiple times.
Redland 1.0.3-1.0.5 (3 releases)
  • A new PostgreSQL storage was added
  • Many fixes for SQLite storage
Language Bindings 1.0.3.1-1.0.5.1 (3 releases)
  • Many fixes were made across all the bindings especially to handle query results.
  • The Python and Ruby bindings got many fixes

and all of them have benefited from better API documents using gtk-doc to replace the older kernel-doc, giving better DocBook and better HTML output. The entire project also switched over from CVS to Subversion early in 2006.

4. Challenge

The main challenge I see is to make the project more scalable - moving from the current state where I do all the packaging and am the main developer. To help this, my goals for 2007 are to:

5. General Tasks

More of a wishlist than an ordered list

DRY = Don't Repeat Yourself

5.1 Pending stuff

There are several tasks already in progress either sitting in a patch, in Subversion or underway separately.

5.2 Raptor tasks

5.3 Rasqal tasks

5.4 Bindings tasks

6. Future Ideas

6.1 New Version control system

This is more speculative and I am giving no firm commitment that this will happen soon. Subversion is stable and well supported.

Move from Subversion to a more distributed development-friendly version control system.

My requirements for a new VCS:

GIT seems one possibility - I tried this conversion already and it worked well. Mercurial I couldn't get it converted without losing information. SVK I'm not so sure about, as I don't like VCS that are layered on others e.g. CVS still leaks it's original RCS basis. I didn't try DARCS. Arch / Bazzar / Bazzar-ng is too bleeding edge. This is a medium term goal.

6.2 Raptor Version 2

This is a break-the-binary-API choice, not a rebuild. The main reason to do this would be to add a 'world' style argument to constructors, like redland has and similar to the curl handle, APR pool or BDB environment. This would mean that raptor_init() and raptor_finish() would be replaced by something like rw = raptor_new_world() and raptor_free_world(rw).

One other reason to do this wuld be to add a pull-style triple parser, where the model is:

parser = new RDFXMLparser()
parser.start_parse( { URI => uri} )
while (not parser.done())
  triple = parser.raptor_get_next_triple()
  ...
delete parser

... rather than the current one of receiving triples via a callback.

However, this would either needed a pull-based WWW library (I know of only libwww and I don't want to use that) or batch up the triples in memory by wrapping the push-based parser or multiple threads, which has it's own set of problems. This would also need an update to the raptor_iostream class to add read methods, but that's easier than the first problem. So this is likely not V2 stuff.

For V2 there would also be a bunch of other API cleanups:

So in summary: this is not being done soon.

7. Call for participation

This is your opportunity to help more directly with Redland, in particular with language bindings as there are a trickle of patches and fixes to these that take me some time to get to looking at and releasing.

These are the areas I've seen that can benefit from an active person:

and deprecate / remove the bindings for C#, Java and Tcl. They stay in Subversion, but are no longer shipped.

What saying "yes" to one of the roles above would mean is gaining the role in the bug tracker for the area and gaining commit to the Redland Subversion for the area, which might mean adding a new area if needed. It might also be that the bindings single package is split into individual language packages means a Subversion change to match.

Thanks for reading.

Dave Beckett,
California, USA, 2007-02-18