For a while now I’ve been working on the standardisation of XCRI-CAP, an XML specification for exchanging data about courses used in things like the HEAR and the JISC Course Data programme. As part of that process I’ve also been building a parser for XCRI in Java (Xcri4j)- partly as I thought it might be useful, and partly as its helped me to identify ambiguities or problems in the specification from an implementer perspective. In structure its quite similar to XcriCap-Net, a library in C# for doing the same kind of thing.
What I found interesting is that it is far easier to write a code-based parser than use any kind of schema-driven or schema-generated system. Partly because XML schema is, frankly, crap, but also as many of the business rules you’re interested in from a data viewpoint fall outside its scope.
I’ve also found it much easier to write a more forgiving parser starting from code; while XML purists may scoff, I think its important to create systems that don’t simply throw your data back in your face if you mix up which of the six namespaces in the document apply to which elements, or indeed just give up and don’t bother using namespaces at all. And while XML is supposed to be case-sensitive, in practice I’d rather accept the data rather than get all particular about capital letters in tag names.
So I’ve built my parser so that it can still throws exceptions for these kind common problems but also gives you corrected data, so an application can log an exception or provide feedback, but still actually use the data. Or you can ask the parser to quietly process stuff and write warnings to your logs and not bother you with any exceptions. In either case you can support the model of being generous when importing, but provide better data validity when exporting.
I’ve abstracted the utility classes that perform these particularly forgiving XML operations into a little project of their own on Github called Laxml (“less anal XML”) as they may be of use in other XML-based projects. They overlay the popular JDom XML library for Java.
I haven’t yet converted the whole parser to use Laxml yet, or yet covered the whole XCRI model, but its on my to-do list.