Going old skool with SQL: from Access to PostgreSQL

Well, after several years where I mostly worked on Redis, MongoDB and client-side browser storage, I’m currently working on a project that is most definitely old-skool SQL. So I spent much of today writing code to handle importing Access databases into PostgreSQL with all the nasty little details that involves.

I’ve now got a two-stage import process with preflight checks working with some of the truly awful Access databases my client sent over – I asked for the worst ones they had to hand and they didn’t disappoint. They are full of gnarly nastiness like OLE fields, bad keys, duplicate rows, weird Access-specific stuff like Switchboard, dangling references and more. Still, nothing better to tax my tests.

Jackcess has proved itself very useful, as you might imagine, but a lot comes down to how the transactions are orchestrated to handle all the kinds of errors that happen when real-world access data is turned into Postgres commands.

I contemplated importing the data before setting up constraints, but in the end went with deferrable constraints and running all the batches within transactions. The only downside is I’ve had to opt for a two-stage process where I exclude the worst offending tables in a first pass preflight check that I then rollback, after which I commit the final import in a second pass.

(The reason for this is basically that if a statement throws a batch exception, I have to either commit or rollback the transaction and start again; if I commit then start a new transaction, I lose any data relying on referential integrity checks from the previous transaction. I can’t just try to continue the transaction after an error – the driver quite rightly tells you off for that sort of thing)

Also, the lack of an UPSERT command or IGNORE directive in PostgreSQL is a real pain. Glad thats on its way in 9.5.

Still, there’s time to try a different tack if this proves problematic – right now I’m just happy I’m able to extract a working schema and good quantity of valid data from these basket cases databases.

It also makes a change from all that REST and JSON stuff I mostly seem to have done in the past few years 🙂

This entry was posted in development. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s