Creating a simple SQL query editor

I recently needed to add a simple visual interface for building SQL queries based on existing schema models – you know the sort of thing, a bit like the UI you get in Access, but running in the browser.

I really, really tried to find something out there that was open source, lightweight, easy to integrate, and preferably client-side only. Well, the best I could find was maybe 3 stars out of 4.

So I ended up building one from scratch using JQueryUI and JsPlumb. I’m quite pleased with the result:

screenshot of visual query editor

You can see a live version over at BitBalloon. The whole thing should be available under an open source license and up on Github soon – it really is a very simple bit of scripting.

Posted in development, open source | Leave a comment

Going old skool with SQL: from Access to PostgreSQL

Well, after several years where I mostly worked on Redis, MongoDB and client-side browser storage, I’m currently working on a project that is most definitely old-skool SQL. So I spent much of today writing code to handle importing Access databases into PostgreSQL with all the nasty little details that involves.

I’ve now got a two-stage import process with preflight checks working with some of the truly awful Access databases my client sent over – I asked for the worst ones they had to hand and they didn’t disappoint. They are full of gnarly nastiness like OLE fields, bad keys, duplicate rows, weird Access-specific stuff like Switchboard, dangling references and more. Still, nothing better to tax my tests.

Jackcess has proved itself very useful, as you might imagine, but a lot comes down to how the transactions are orchestrated to handle all the kinds of errors that happen when real-world access data is turned into Postgres commands.

I contemplated importing the data before setting up constraints, but in the end went with deferrable constraints and running all the batches within transactions. The only downside is I’ve had to opt for a two-stage process where I exclude the worst offending tables in a first pass preflight check that I then rollback, after which I commit the final import in a second pass.

(The reason for this is basically that if a statement throws a batch exception, I have to either commit or rollback the transaction and start again; if I commit then start a new transaction, I lose any data relying on referential integrity checks from the previous transaction. I can’t just try to continue the transaction after an error – the driver quite rightly tells you off for that sort of thing)

Also, the lack of an UPSERT command or IGNORE directive in PostgreSQL is a real pain. Glad thats on its way in 9.5.

Still, there’s time to try a different tack if this proves problematic – right now I’m just happy I’m able to extract a working schema and good quantity of valid data from these basket cases databases.

It also makes a change from all that REST and JSON stuff I mostly seem to have done in the past few years 🙂

Posted in development | Leave a comment

HtmlCleaner 2.10 released

Get it here!

Posted in Uncategorized | Leave a comment

Tracking student blogs using Google Spreadsheets and WordPress

I’m currently teaching a first year module on Open Source Software, one of the requirements for which is students write their findings up as blog posts. For that reason I thought it might be useful to be able to keep track of how much my students have been writing. Given there are 80 students on the module, some automation here would also be useful!

I remembered an impressive set of techniques developed by Martin Hawksey for tracking Jisc project blogs, and so I decided to use these as a starting point.

I created a Google Spreadsheet, and a form for capturing each student’s name and blog URL. I then added a couple of scripts – based on Martin’s examples above – to retrieve the WordPress RSS feed for each blog, and put the post count and the date of the last post into cells in the same row.

Now, WordPress only gives you a maximum of 10 entries in a feed by default, but for my purposes thats still enough to get a sense of which students are struggling with their writing tasks. I just use some conditional formatting to show me anyone who hasn’t posted anything, and anyone who has made less than 5 posts this semester.

I’ve experimented with some other layouts, for example using D3.js visualisations, but just a list of blogs with some red-amber-green coding seems to be the most practical.

Another benefit of having the list of blogs in a spreadsheet is it made it quite simple to generate an OPML file to share with students to import into WordPress Reader and  follow everyone else on the course.

One limitation is I don’t seem to be able to get the functions to automatically be added to each new row created by the form – I have to paste them over the new rows. Still, overall its not a bad solution.

You can see a copy of my spreadsheet here (without the actual student blogs on it), the form for collecting blog URLs, and you can get the scripts from Github.

Posted in mashups | 1 Comment

HtmlCleaner 2.9

I’ve finally released version 2.9 of HtmlCleaner!

This month I also had to answer my first ever official support request for HtmlCleaner in my “day job” – it appears a researcher at the University of Oxford is using it as part of a project analysing legislation passed by the State Duma of the Russian Federation 🙂

Posted in Uncategorized | 1 Comment

Looking at Kwan and West’s Open Source Adoption Model

In my last post I looked at the first part of Kwan and West’s paper, which concerns the phases of technology adoption. Now I’m turning to the second part, which concerns the adoption model.

Kwan and West propose the following model for open source adoption:

Open Source Adoption - Kwan and West 2005

The model has two decision points: the first where the choice set of applicable solutions is identified, and the second where a selection is made from the set available.

What I find interesting here is how strongly they position software policy in the first decision point. They I think rightly identify policy as being at least partly reactive; that policy is determined to some extent not by the intentions of the organisation so much as its past history (the feedback loop in the diagram). There is also the contextual influence of the culture of the firm and the industry it sits within.

Within UK higher education we have a reasonable data set for how open source is represented in software policies from 10 years of OSS Watch surveys:

open source in software policies

To some extent “policy” is a little bit misleading in Kwan and West’s paper as they elaborate this as not just formal policies but also the “pattern of attitudes and decisions that constitute a de facto policy”, so could be said to encompass institutional cultural attitudes.

(In practice I’ve found that in higher education, policy is set at an organisational level, but cultural attitudes towards, and awareness of open source can vary by department or even by group within a department.)

The second factor influencing the choice set is the application context, which is where we refer back to the model of phases of strategic importance to determine the relative weighting of features, risks, and costs.

Finally, there is the set of available products that can be considered.

These three factors determine the choice set, which are then compared in the second decision point for selection.

In my experience the first decision  point, of determining the choice set, is where many open source candidate solutions tend to be excluded; for quite a few different reasons:

  • The process of procurement is primarily passive in nature, expecting suppliers to respond to RFPs at their own expense, a route that tends to favour suppliers of proprietary systems
  • The organisation has low awareness of open source options, and does not look beyond its set of usual suppliers when identifying available products
  • The procuring organisation has a set view on the composition of the solution, requiring a single product from a single supplier, for example, rather than considering an integration of different components to perform the same functions.

When we looked at the OSS Watch survey results for 2013, we found that a perception that “there isn’t a solution that meets our needs” was the overall number one supply-side reason given for deciding against open source, while interoperability and migration issues were the number one demand-side reason.

For me this indicates that there is perhaps something missing from Kwan and West’s model, or that perhaps the “policy” component is trying to cover too many different factors. For example, would a passive-RFP approach rather than an active discovery approach fit within “policy” or is it an influence on “available products”? Also, where do constraints such as contractual obligations and interoperability issues (particularly lock-in) reside in the model?

So, thats the first decision point. The second decision point is I think fairly standard, though here the “selection metrics” factor has some special considerations for open source that can be overlooked.

When I talk to organisations about procurement (with my OSS Watch hat on) I tend to talk a fair amount about sustainability, and in particular how to evaluate open source projects for their likely future viability. This isn’t necessarily a criteria used in traditional selection metrics, or if it is, it isn’t one which is easy to evaluate or have good data for.

Another aspect of selection is pre-sales activities such as demonstrations and pilot systems; again there is sometimes a problem here with procuring organisation’s expectations being conditioned to how closed-source companies tend to work, and this isn’t included anywhere in the model.  I guess generally Kwan and West have operated from a presumption of general good procurement practice by the organisation, whereas in the context I’m interested in (UK HE) I’m very much aware of the existence of gaps between policy, process and practice, and the impact this has, and so can’t really exclude them from the model.

Finally, I think the model is suited to procurement of systems at all phases of strategic alignment apart from strategic systems; in these cases I think its more likely that a solution does not already exist, but has to be created in partnership with suppliers as a “pre-procurement” activity. However, as I’ve already pretty much concluded that there is no strategic IT in HE, its not something I need to dwell on.


I think Kwan and West have made a good stab at a useful conceptual model here, and in particular I think the strategic alignment phases concept elaborated in the first half of the chapter is a very useful one. The adoption model has some good features, but seems to miss out or have insufficient emphasis on factors that I know do affect selection practices, so it would require some further tweaking before I could make good use of it. The authors have done a fair amount of follow up work which I’ve also put on my reading list, and hopefully they revisited the model.

Posted in open source, phd, Uncategorized | Leave a comment

Is there such a thing as “strategic IT” in Higher Education?

In their 2005 paper, A conceptual model for enterprise adoption of open source software [1], Kwan and West use as a starting point the “Strategic Grid” developed originally by McFarlan, McKenney and Pyburn (1983) [2] which divides firms or divisions of a company into four categories depending on how valuable IT is to the performance of the division:

  • Strategic
  • Factory
  • Support
  • Turnaround

However, Kwan and West contend that not all IT systems are of equal importance within a division or company – just because IT is “strategic” to a division doesn’t mean all of its IT systems are!

So Kwan and West adapt the model as a set of stages of business alignment for individual systems:

  • Strategic
  • Mission Critical
  • Support
  • Laboratory

Strategic systems are those that provide actual competitive advantage over competitors; mission critical systems are those needed to support the operation of the business, and whose interruption can cause loss of revenue; support systems provide business value through internal efficiency; and finally laboratory systems are non-production systems used to pilot innovations and experiment with new technologies.

Why “stages” and not “categories”? Well, the authors point out that many IT systems in organisation exhibit strategic shifts over time – software originally deployed as a pilot at the laboratory stage may be implemented as a strategic system; then eventually the system becomes commodified in the market and downshifts to being mission critical.

It seems a pretty sensible model – but how does it look when applied to IT in the Higher Education sector? While the model seems reasonably straightforward, applying it to actual deployed systems is not. Some of the systems deployed in a typical HE institution include:

  • Student information system
  • Library management system
  • Research management
  • Learning management system (VLE)
  • Network management system
  • User and account management system
  • Timetabling and enterprise resource management
  • Human resources
  • Finance information system
  • Mail and groupware systems
  • Content management system
  • Institutional repository
  • Business intelligence and analytics system
  • Curriculum management system

… and thats before we get into the various blogs, wikis, specialized research tools and teaching tools.

What I find striking is that while some of these are undoubtedly mission critical (the student information system in particular), its difficult to make a case for any systems as being strategic using Kwan and West’s definition. Do any of these systems provide actual (not imagined) competitive advantage? I think business intelligence and analytics is frequently pitched as such, but in practice would support internal efficiencies in many institutions.

For some institutions that specialise in online delivery the VLE/LMS might be considered strategic; and for the Open University both the LMS and content management may have a more strategic role than for a typical university, but even there its not 100% clear.

So, is it true that “IT doesn’t matter”as Carr (2003) claimed [3]? Carr argues that information technologies, like previous technologies, become ubiquitous and commodified over time. Therefore, they are now strategically unimportant because individual organisations can no longer gain a competitive advantage from IT.

McCredie (2003) considered Carr’s views from a HE perspective [4] and concurred that IT in education is “strategically invisible”, being a requirement to stay in business but not offering competitive differentiation. However, he does contend that “If staying in business is not a strategic concern, I do not understand the meaning of the term” which is certainly true; though it does still place IT systems in the business critical rather than the strategic phase of the Kwan and West model.

However, McCredie also points out that, if current IT systems are not strategic, this doesn’t rule out the development of future systems that will provide competitive advantage. 

(It may be argued that MOOCs are an attempt to elevate learning management to have strategic value within a modified higher education business model. If the business of the organisation  — e.g. Coursera, Udacity, EdX — is fundamentally about the online delivery of teaching, then the system deployed may indeed be strategic.)

So what does this imply?

Well,  Kwan and West go further than just identifying the phases of IT systems, they also look at the tradeoffs made in adoption between features, cost and risk as being dependent of the strategic positioning of the system.

For example, for business critical phase systems, organisations will trade off cost and features to minimise risks; for support systems organisations will trade off features and risk to minimise costs, and for strategic systems they are willing to trade off costs to maximise features. (Features being defined here as “everything that isn’t risk or cost”)

For laboratory systems its less clear cut as it depends on the prospective phase of the system being piloted.

This implies that most IT systems in Higher Education will be either mission-critical systems primarily selected for their relatively low levels of risk, and support systems selected for their low cost (in TCO terms). Or, to put it another way, the characteristics that institutions are looking for most in their IT systems are reliability and cost-effectiveness; so the key innovations will be those that enable lower costs or improved reliability – such as virtualisation and software-as-a-service – rather than new capabilities.

One could also put this the other way around and state that, as HE organisations are typically not interested in radically changing their business models, it is unlikely that they will be looking for IT to deliver strategic value, but instead to improve the reliability and costs associated with their operations. Instead, systems that offer new capabilities will be adopted by organisations that are looking to either disrupt the existing market, or operate within a niche where IT can offer strategic value. (The one potential exception is business intelligence and analytics, but we’ve yet to see examples of actual competitive advantage being realised; its also unclear how long such an advantage would persist before the technology is commodified).

Given this context, it would seem that the future of IT in higher education will be focussed on realising the benefits of commodification.

This includes migrating more support systems from expensive proprietary products, either to open source technologies with lower TCO (e.g. moving more content management to Drupal) or to commodity software-as-a-service offerings. It will also include realising  reliability and cost improvements from moving more systems to take advantage of commodity cloud computing.

I’m about to embark on a PhD looking at adoption of open source technologies in Higher Education, so I’m catching up on my reading by doing some blog posts on interesting papers on the topic. So there’ll be more posts like this in the coming months.

The second half of Kwan and West’s paper deals with creating a model for open source adoption using this background, but I’ll leave that for another post.

[1] Kwan, S. K., & West, J. (2005). A conceptual model for enterprise adoption of open source software. The standards edge: Open season, 51-62.

[2] McFarlan, F. W., McKenney, J. L., & Pyburn, P. (1983). The information archipelago-plotting a course.

[3] Carr, N. G. (2003). IT doesn’t matter. Harvard Business Review, May 2003.

[4] McCredie, J. (2003). Does IT matter to higher educationEducause Review,38(6), 14-22.

Posted in open source, phd | 3 Comments

DOMImplementation hates HTML5

This doesn’t work:

documentType = impl.createDocumentType("html", "", "");
document = impl.createDocument(null, "html", documentType);

assertEquals("head", document.getChildNodes().item(0).getChildNodes().item(0).getNodeName());

In fact, it just silently fails to add any child nodes. No exceptions, nada.

This gives the same result:

documentType = impl.createDocumentType("html", null, null);
document = impl.createDocument(null, "html", documentType);

assertEquals("head", document.getChildNodes().item(0).getChildNodes().item(0).getNodeName());

But *this* does work:

documentType = impl.createDocumentType("html", " ", " ");
document = impl.createDocument(null, "html", documentType);

assertEquals("head", document.getChildNodes().item(0).getChildNodes().item(0).getNodeName());


Posted in Uncategorized | Leave a comment

HtmlCleaner 2.8 is out

Its the first release of 2014, and its got a nice patch from Rafael that makes it run a lot faster (who knew that just checking whether a String is a valid Double in the XPath processor would cause so much stress?) and another patch from Chris that makes it output proper XML ID attributes in DOM.

My contributions this time around were more enhancements to “foreign markup” handling, which is important when cleaning up HTML that contains valid SVG or content. HtmlCleaner wasn’t really written with that sort of use in mind when Vladimir started on it back in 2006, so it involved a fair bit of wrangling, but I think we’re nearly there now.

Its good to see the number of contributions going up – last release we had an entire GUI contributed by Marton – and I think having a faster release schedule is helping with that. Hopefully one day I’ll be able to make releases that just consist of having applied other people’s patches 🙂

HtmlCleaner is HTML parser written in Java. It transforms dirty HTML to well-formed XML following the same rules that the most web-browsers use. Download HtmlCleaner 2.8 here, or you can get it from Maven Central.

Posted in Uncategorized | Leave a comment

5 lessons for OER from Open Source and Free Software

While the OER community owes some of its genesis to the open source and free software movements, there are some aspects of how and why these movements work that I think are missing or need greater emphasis.

open education week 2014

1. Its not what you share, its how you create it

One of the distinctive elements of the open source software movement are open development projects. These are the projects where software is developed cooperatively (not collaboratively, necessarily) in public, often by people contributing from multiple organisations. All the processes that lead to the creation and release of software – design, development, testing, planning – happen using publicly visible tools. Projects also actively try to grow their contributor base.

When a project has open and transparent governance, its much easier to encourage people to voluntarily provide effort free of charge that far exceeds what you could afford to pay for within a closed in-house project. (Of course, you have to give up a lot of control, but really, what was that worth?)

While there are some cooperative projects in the OER space, for example some of the open textbook projects, for the most part the act of creating the resources tends to be private; either the resources are created and released by individuals working alone, or developed by media teams privately within universities.

Also, in the open source world its very common for multiple companies to put effort into the same software projects as a way of reducing their development costs and improving the quality and sustainability of the software. I can’t think offhand of any examples of education organisations collaborating on designing materials on a larger scale – for example, cooperating to build a complete course.

Generally, the kind of open source activity OER most often resembles is the “code dump” where an organisation sticks an open license on something it has essentially abandoned. Instead, OER needs to be about open cooperation and open process right from the moment an idea for a resource occurs.

Admittedly, the most popular forms of OER today tend to be things like individual photos, powerpoint slides, and podcasts. That may partly be because there is not an open content creation culture that makes bigger pieces easier to produce.

2. Always provide “source code”

Many OERs are distributed without any sort of “source code”. In this respect, license aside, they don’t resemble open source software so much as “freeware” distributed as executables you can’t easily pick apart and modify.

Distributing the original components of a resource makes it much easier to modify and improve. For example, where the resource is in a composite format such as a PDF, eBook or slideshow, provide all the embedded images separately too, in their original resolution, or in their original editable forms for illustrations. For documents, provide the original layout files from the DPT software used to produce them (but see also point 5).

Even where an OER is a single photo, it doesn’t hurt to distribute the original raw image as well as the final optimised version. Likewise for a podcast or video the original lossless recordings can be made available, as individual clips suitable for re-editing.

Without “source code”, resources are hard to modify and improve upon.

3. Have an infrastructure to support the processes, not just the outputs

So far, OER infrastructure has mostly been about building repositories of finished artefacts but not the infrastructure for collaboratively creating artefacts in the open (wikis being an obvious exception).

I think a good starting point would be to promote GitHub as the go-to tool for managing the OER production process. (I’m not the only one to suggest this, Audrey Watters also blogged this idea)

Its such an easy way to create projects that are open from the outset, and has a built in mechanism for creating derivative works and contributing back improvements. It may not be the most obvious thing to use from the point of view of educators, but I think it would make it much clearer how to create OERs as an open process.

There have also been initiatives to do a sort of “GitHub for education” such as CourseFork that may fill the gap.

4. Have some clear principles that define what it is, and what it isn’t

There has been a lot written about OER (perhaps too much!) However what there isn’t is a clear set of criteria that something must meet to be considered OER.

For Free Software we have the Four Freedoms as defined by FSF:

  • Freedom 0: The freedom to run the program for any purpose.
  • Freedom 1: The freedom to study how the program works, and change it to make it do what you wish.
  • Freedom 2: The freedom to redistribute copies so you can help your neighbor.
  • Freedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits.

If a piece of software doesn’t support all of these freedoms, it cannot be called Free Software. And there is a whole army of people out there who will make your life miserable if it doesn’t and you try to pass it off as such.

Likewise, to be “open source” means to support the complete Open Source Definition published by OSI. Again, if you try to pass off a project as being open source when it doesn’t support all of the points of the definition, there are a lot of people who will be happy to point out the error of your ways. And quite possibly sue you if you misuse one of the licenses.

If it isn’t open source according to the OSI definition, or free software according to the FSF definition, it isn’t some sort of “open software”. End of. There is no grey area.

Its also worth pointing out that while there is a lot of overlap between Free Software and Open Source at a functional level, how the criteria are expressed are also fundamentally important to their respective cultures and viewpoints.

The same distinctive viewpoints or cultures that underlie Free Software vs. Open Source are also present within what might be called the “OER movement”, and there has been some discussion of the differences between what might broadly be called “open”, “free”, and “gratis” OERs which could be a starting point.

However, while there are a lot of definitions of OER floating around, there hasn’t emerged any of these kind of recognised definitions and labels – no banners to rally to for those espousing these distinctions .

Now it may seem odd to suggest splitting into factions would be a way forward for a movement, but the tension between the Free Software and Open Source camps has I think been a net positive (of course those in each camp might disagree!) By aligning yourself with one or the other group you are making it clear what you stand for. You’ll probably also spend more of your time criticising the other group, and less time on infighting within your group!

Until some clear lines are drawn about what it really stands for, OER will continue to be whatever you want to make of it according to any of the dozens of competing definitions, leaving it vulnerable to openwashing.

5. Don’t make OERs that require proprietary software

OK, so most teachers and students still use Microsoft Office, and many designers use Adobe. However, its not that hard to develop resources that can be opened with and edited using free or open source software.

The key to this is to develop resources using open standards that allow interoperability with a wider range of tools.

This could become more of an issue if (or rather when) MOOC platforms start to  “embrace and extend” common formats for authors to make use of their platform features. Again, there are open standards (such as IMS LTI and the Experience API) that mitigate against this. This is of course where CETIS comes in!

Is that it?

As I mentioned at the beginning of this post, OER to some extent is inspired by Open Source and Free Software, so it already incorporates many of the important lessons learned, such as building on (and to some extent simplifying and improving) the concept of free and open licenses. However, its about more than just licensing!

There may be other useful lessons to be learned and parallels drawn – add your own in the comments.

Posted in cetis, open education, standards | 6 Comments