test-driven woopsie

Draft of 2008.11.03 ☛ 2015.04.02

May include: agilelearning in public&c.

So I’ve been trying to refactor a big ball of mud in our Nudge project. There are about five modules that overlap where they shouldn’t, and on inspection they were all more or less separable. So about a week ago (before I came down with a Cold from Hell) I started writing tests to tease them apart, and building little snippets of infrastructure that I could shunt the flow of the program over to, and extracting methods and building singletons and all kinds of great Fowleresque refactoring magic I actually, personally haven’t done before.

And every time I would get so far, and there would be a sudden explosion. Huge swathes of tests failing. Huge big chunks of interconnected code brought down all at once by a single change. I’d back up a few steps in the repository, step towards my goal from a slightly different direction, and then BAM it would come crashing down again.

The codebase is one we’ve written over almost a year, essentially learning TDD in Python (using nose) as we go. So as a primary learner among us, I admit some of the older tests are effective but kinda kludgy; readable but in hindsight what you’d call a bit quirky here and there.

And so finally, maybe a week’s work (being two weeks by the calendar) after undertaking this “minor” refactoring, I stumbled across the same goddamned explosion. Fifth, sixth time.

Test every goddamned thing. Before you write it. Before you assume anything. Test it.

But… even decent paranoiac tendencies won’t save you every time. They’ll improve your chances, but they won’t save you every little bit of confusion.

As it happens, the problem was as simple as can be, but hidden so far from its effects that it was impossible to detect without seventeen (17) new tests to narrow down the scope of its reach: In our old system, we regenerated a huge data structure—basically the lookup table for every opcode in the language—every time we ran the language interpreter. Millions of times, in a given genetic programming run.

The refactoring I was trying wasn’t an optimization so much as a way of making our language extensible, but nonetheless I decided to build that huge data structure once, and then refer to it as needed for the millions of calls. Just look things up instead of building it all from scratch every time.

Word of advice: When you’re writing code TDD, don’t delete small chunks of a persistent structure in your unit tests. Yes, you should test for proper behavior when dependencies are violated… but don’t cause the violations as a consequence of your test.

See, back when we were creating and destroying the language infrastructure every time we created an interpreter, the fact that a few tests deleted instructions was unimportant. Now we’re doing it once, and relying on a single more or less permanent structure’s integrity over many tests at once… any test that changes it qualitatively is a bad, bad team player.

A small but actual software release is imminent, by the way.