Mass query generation using “The SpreadSheet Trick”

I honestly assumed that most developers already knew about “the spreadsheet trick”. However, in the last few months and years I’ve shown it to a few developers and they go through and interesting mix of emotions. If you know this trick, you’ll know the exact same feeling; shame followed by “can’t argue with results” pride. If you have a tonne of data provided in a spreadsheet, your next step might be to write a little throwaway PHP script to do the conversion and import. Let’s take another, slightly unorthodox approach.

You’ve just been given a spreadsheet of amendments to make to historic orders on your bespoke e-commerce system. For one reason or another (and this is very hypothetical) you weren’t storing VAT individually against orders. You were working it out on the fly. So, your accountant has created a spreadsheet which contains all the historic orders, helpfully providing the order reference and the calculated VAT.

Order Reference VAT
343637 34.64
457424 7.23
2,000 more rows

The next step is where the laziness ingenuity kicks in. Rather than try and write a parser and query builder, let’s just see what needs to happen to make this work as an SQL statement. Insert a column before “VAT”. Switch off your sense of professional pride and type some SQL in the first column. Looks like we need to switch the “VAT” and “Order Reference” columns around. Now your spreadsheet looks like this:

VAT Order Reference
UPDATE Orders SET vat=' 34.64 343637
7.23 457424
2,000 more rows

Well that was easy. Almost there in fact! Since no-one’s looking at your monitor and as such, cannot judge you, let’s continue. We need an extra couple of columns. Create one before and after “Order Reference”.

VAT Order Reference
UPDATE Orders SET vat=' 34.64 ' WHERE (SELECT order_id FROM Orders WHERE ref = ' 343637 ‘);
7.23 457424
2,000 more rows

Now, do a fill down operation (i.e. copy the text columns down for the next 2,000 rows or however big the spreadsheet is). You’ve got a Frankenstein’s monster of SQL, but it looks like it works. The only task that remains is to get that data out of Excel/LibreOffice and into MySQL. Export it as a tab-separated, unquoted file. For extra dirty points, you can just copy and paste straight into gedit. Once in gedit, do a find replace on \t to remove all tabs and suddenly you’ve got immediately executable, fully working, SQL file. Moreover, you’ve got it in about 2 minutes of effort. You know you’ve done this trick a couple of times when you don’t even need to use the spreadsheet. You skip that step, open the CSV file in gedit and use "," in your find/replace commands.

If you’ve done your job correctly, every single one of those queries will work. There’s no fgetcsv dance. You don’t need composer to use your favourite lightweight DBAL. It’s just plain old SQL. Now, one could argue that a correctly written import script would work just as well, but I’d always feel more confident pushing formatted SQL files into MySQL rather than have a throw-away PHP script do the heavy lifting for me.

Now, it goes without saying that this trick is better suited to some scenarios than others. I’ve probably only done it a handful of times in my career so far. To be honest, I’ve told more people about it than I’ve actually done it myself. So, I don’t want any complaints about stupid security holes or performance issues – it’s a the hackiest hack that someone ever hacked.

Sometimes, you just can’t argue with results.

Tagged with: ,
Posted in MySQL

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">