State of Redland 2007-02
Redland was born 2000-08. Happy 6.5th birthday!
This is a review of the last approximately 15 months since I moved
to the USA in Oct 2005 to work for Yahoo! Media Group in Sunnyvale,
California. It covers:
(This is on the web at
http://librdf.org/2007/02/18-state/)
1. Redland Users
Redland is made available by several Linux, Unix and other
open source projects such as:
- Debian (sarge, etch)
- Fedora (FC4 onwards) : just Raptor
- FreeBSD Ports
- Gentoo
- Mandriva (9.1 onwards) : Raptor and Redland
- SUSE (9.2 onwards)
- Ubuntu (breezy, hoary, dapper, edgy)
and the libraries are also used inside other applications
and services such as, for example:
- ActiveRDF ruby RDF
- Amaya web browser and HTML Editor
- Ardour digital audio workstation
- Hydrogen simple drum machine/step sequencer
- Morla RDF graphical editor
- My Opera
- Nepomuk KDE semantic desktop app
- The Venice Project client side (I think!)
- Venus feed aggregator
- Yahoo! Food, TV, Personal Finance ... web sites
- ... but I am not keeping track of these very well in the
applications list ...
2. State of the libraries
My summary of the high-level state of the libraries is:
- Raptor syntax parsing and serializing:
libraptor
- Very mature. The API is changing rarely, mostly bug fixes or
adding new features to existing parsers/serializers or adding entirely
new ones.
- Rasqal query parsing, executing:
librasqal
- Under development. The API is changing with each release as it is both not complete and the SPARQL query engine implementation is not fully functional
- Redland RDF API and triple stores:
librdf
- Mature. Some API change is happening to add new features especially
for query and storage.
- Binding languages
- A mixture of mature bindings such as Perl, Python which are well
tested, working and complete and immature ones with little testing
or known incomplete, such as Tcl and Java.
I feel it is too large for one person to maintain who has all the
N-language skills unless that person is me and I do nothing else!
3. Releases
For each of the libraries, the period above has seen the following
releases with major changes:
- Raptor 1.4.8 - 1.4.14 (7 releases)
- A new user tutorial covering the entire API was written.
- A new RSS tag soup parser was added
- New Atom 1.0, RSS 1.0, Turtle and DOT serializers were added.
- Rasqal 0.9.11 - 0.9.13 (3 releases)
-
- Updated the SPARQL syntax support to match the November 2005 and April 2006 W3C Working Drafts.
- Can now serialize query results to JSON.
- Added APIs to manager query results serializing
- The query engine had it's ordering, distinct and limit support fixed.
- Lots of internal query engine changes, in particular to split the query
parsing ('prepare') and the query execution ('execute'). These were
too intertwined in earlier versions. So now you can nearly
execute the same query multiple times.
- Redland 1.0.3-1.0.5 (3 releases)
- A new PostgreSQL storage was added
- Many fixes for SQLite storage
- Language Bindings 1.0.3.1-1.0.5.1 (3 releases)
-
- Many fixes were made across all the bindings especially to handle
query results.
- The Python and Ruby bindings got many fixes
and all of them have benefited from better API documents using
gtk-doc to replace the older kernel-doc, giving better DocBook and
better HTML output. The entire project also switched over from CVS
to Subversion early in 2006.
4. Challenge
The main challenge I see is to make the project more scalable -
moving from the current state where I do all the packaging and am the
main developer. To help this, my goals for 2007 are to:
- Try to make the development more of a shared task
- Make it easier to work on just part of Redland
- Turn the main website into a shared read/write developer resource
- Schedule #redland IRC developer meetings if that will help give
the project more of a regular heartbeat
5. General Tasks
More of a wishlist than an ordered list
- Think about a License change to Apache2 only.
- Make Redland turn SPARQL into underlying SQL queries when possible.
- Create the redland developer's site in something like Drupal.
- Start the redland (librdf) API tutorial.
- Create some documentation to explain the libraries structure and relationships.
- Consider not shipping raptor and rasqal inside the redland tarball
- Create documentation on the data flow inside the libraries
- Figure out whether to keep writing manual pages as well as gtkdoc. (DRY)
- Figure out where module/implementation documentation goes, such
as storage options in redland, parser features in raptor etc. This
is needed in C and in the bindings as it is not about the actually
functions called. (DRY)
- The demos need to be updated and the changes made
put back into subversion.
- A SPARQL protocol endpoint demo would be good to have
DRY = Don't Repeat Yourself
5.1 Pending stuff
There are several tasks already in progress either sitting in a
patch, in Subversion or underway separately.
- A new schema for the SQLite store: me (patch)
- Redland transaction support: me (in Subversion)
- Object-based PHP5 bindings: Yahoo! (pending)
- SPARQL syntax extensions called LAQRS: me (in Subversion)
- Apache2
mod_sparql
: David Reid (separate project)
- A new native Ruby binding not using SWIG: somebody on IRC
- Complete the Raptor GRDDL support: me (in Subversion)
5.2 Raptor tasks
- Complete the GRDDL support: nearly done
- Bug fixes only for 2007
5.3 Rasqal tasks
- Make Rasqal be able to execute complete SPARQL
- Make SPARQL
OPTIONAL
s work
- Make SPARQL
GROUP
work
- Make SPARQL
UNION
work
- Make datatypes work, especially
xsd:date
and xsd:decimal
(bignum library)
- Read result sets from the sparql query results XML
- Write a query optimiser
- Add a way to declare extension functions
- Look into language extensions
- Address query engine denial of service:
- limit query wall clock time
- limit triple pattern matches
- callback to allow application to abort queries?
- limit memory use?
- limit sorting of results?
- limit URI fetching is done now with the raptor changes
5.4 Bindings tasks
- Split the single language bindings package to be one per-binding.
That would be: Perl, PHP5, Python and Ruby
- Make the Perl binding into a CPAN installable tarball - partially done but not entirely working
- Deprecate or remove bindings that have no active maintainer.
These would be C#, Java and Tcl.
6. Future Ideas
6.1 New Version control system
This is more speculative and I am giving no firm commitment that
this will happen soon. Subversion is stable and well supported.
Move from Subversion to a more distributed development-friendly
version control system.
My requirements for a new VCS:
- Distributed - no central repository required
- Can operate networkless
- Friendly to managing patches
- Quick
- Reliable and successful (no research project, bleeding edge)
- Mature
GIT seems one possibility - I tried this conversion already and it
worked well. Mercurial I couldn't get it converted without losing
information. SVK I'm not so sure about, as I don't like VCS that are
layered on others e.g. CVS still leaks it's original RCS basis. I
didn't try DARCS. Arch / Bazzar / Bazzar-ng is too bleeding edge.
This is a medium term goal.
6.2 Raptor Version 2
This is a break-the-binary-API choice, not a rebuild.
The main reason to do this would be to add a 'world' style
argument to constructors, like redland has and similar to
the curl handle, APR pool or BDB environment.
This would mean that raptor_init()
and raptor_finish()
would be replaced by
something like rw = raptor_new_world()
and
raptor_free_world(rw)
.
One other reason to do this wuld be to add
a pull-style triple parser, where the model is:
parser = new RDFXMLparser()
parser.start_parse( { URI => uri} )
while (not parser.done())
triple = parser.raptor_get_next_triple()
...
delete parser
... rather than the current one of receiving triples via a callback.
However, this would either needed a pull-based WWW library (I know of
only libwww and I don't want to use that) or batch up the triples in
memory by wrapping the push-based parser or multiple threads, which
has it's own set of problems. This would also need an update to the
raptor_iostream class to add read methods, but that's easier than the
first problem. So this is likely not V2 stuff.
For V2 there would also be a bunch of other API cleanups:
- Rename all
raptor_foo
functions to be raptor_parser_foo
where they really are about parsers
- Ditch the URI context/data and use raptor_world to hold that
- Alter
raptor_statement
to have 4 components
including a context / graph / formula so that Raptor could parse N3.
Possibly rename it to raptor_triple
.
So in summary: this is not being done soon.
7. Call for participation
This is your opportunity to help more directly with Redland, in
particular with language bindings as there are a trickle of patches
and fixes to these that take me some time to get to looking at and
releasing.
These are the areas I've seen that can benefit from an active
person:
- OSX porter / (ObjC binding maintainer)
- Win32 porter
- Perl binding maintainer
- Python binding maintainer
- (New Ruby binding?)
- (New PHP5 binding: Yahoo! pays me to look after this)
and deprecate / remove the bindings for C#, Java and Tcl.
They stay in Subversion, but are no longer shipped.
What saying "yes" to one of the roles above would mean is gaining
the role in the bug tracker for the area and gaining commit to the
Redland Subversion for the area, which might mean adding a new area
if needed. It might also be that the bindings single package is
split into individual language packages means a Subversion change to
match.
Thanks for reading.
Dave Beckett,
California, USA, 2007-02-18