Reliability and creativity
Draft of 2018.05.25
May include: GP ↘ testing ↗ &c.
Last time I spent a while musing out loud on something like “What are we asking GP (generative programming, genetic programming, software synthesis, &c) systems to do, really?”
I pointed out that sometimes we “only” want them to do the done thing.1 That is, we want them to discover and produce the “right” model of the data we’ve given them, or to answer the question posed in just the way that an experienced engineer would first do so. We want the answer that doesn’t freak out an expert, or trouble an amateur. In short, we want the GP system to act like a conservative, staid, reliable, sensible human engineer, to start with the simplest possible solution first, and not get to the weird-ass shit ever, if possible. We want it to be born with “Occam’s Razor” gripped tight in its little algorithmic fist, quivering with parsimony.
Over the decades, I’d say that most GP engineering and research has been born over towards that end of the axis. But I’m not sure this was because we have an intrinsic love of Occam’s Razor, so much as the limits and practicalities of computational infrastructure, back the day. In the days of Genetic Programming’s emergence, computing was expensive, memory and storage were dear, and it was impractical to “try all the things” and “save all your work”. As a result, it could just be that the practical habit developed favoring modest representations, composed of “reasonable-seeming” parts, when necessary salted with a few domain-specific extensions on a per-problem basis. When we searched for a function to fit some data, we searched with arithmetic and maybe a \(sin(x)\) and a \(sqrt(x)\) here and there, and dammit we were happy. Things got fancy and weird (by those standards) when we introduced piecewise functions that contained if
clauses into traditional mathematical-programming domains.
These were the days when “bloat” was a serious issue in the GP literature. When people filled their houses with Beowulf clusters made from Pentium desktop boxes, and spent weeks of screaming-fast 100-MHz compute power on relatively small symbolic regression problems.
These were also the days when ParetoGP algorithms were invented, which permit slightly more complex expressions to be found, but at the same time avoid spending a disproportionate amount of computational effort on the weird, big possibilities. Leaving time for the weird stuff but mainly sticking to the basics, in other words.
Bloat is redundant code, a waste of space in storage and time in evaluating meaninglessness. And bloat-free, ornate models are similarly a waste of time, because of the way nonlinearities make things wiggle. Resources had their limits, more then than now, and accuracy was king. People were pleased (in most cases) that a relatively thorough search of the combinatorial space of functions was getting started, as long as we started back down towards the “simpler” corner of that combinatorial space. With a dozen or so functions, and a dozen or so variables, it turns out that there are still a lot of models you can compose… but not an unthinkable number.
One was left, even in the earliest GP work, with the sense that when all those fans had run and all those CPU cycles had been exercised, a big proportion of the low-hanging fruit had been considered. We were happy to explore thoroughly under the light post.
But motivating these practical and effective parsimonious approaches, there was always the founding grail quest: the thought of “general intelligence”. Everybody who launched a GP search—whether it was intended to search over symbolic expressions or agent behavior—“had in mind” that whenever we got that next budget approved (or when Moore’s Law had done its work), we could be confident that the very same system could be “switched over” and do something else. That by solving the little, lower-left corner of the vast space of Problem A, we were also in some way helping learn indirectly about some Problem B.
That is, that one problem might generalize to others.
Old times
I’m reminded of this sensibility because I’ve just been re-reading Trent McConaghy’s paper on his ingenious Fast Function Extraction (FFX) algorithm.
Now the FFX algorithm embodies Occam’s Razor by strictly limiting the complexity of the expressions it builds. As a result, it can be remarkably fast and effective when you’re searching for relatively simple numerical expressions to fit large data sets in “normal engineering” situations. It doesn’t use “evolutionary dynamics” at all, but rather the straightforward combinatorics and modern nonlinear numerical curve-fitting methods that balance “making things more complicated” against “doing only enough to suit the situation”.
It’s a lovely and elegant little algorithm, and there’s a Master’s thesis for somebody’s Science Studies ethnography work if they’d only explain why this algorithm hasn’t made symbolic regression the go-to approach in machine learning systems across the entire community. Why is it so obscure, in the engineering literature?2
And as I mentioned last time, there are many perfectly reasonable situations to make the demand that an “automatically discovered controller” must be simple enough to be obvious. If the only thing needed is to find the right constants to fit a specific model, traditional (linear, nonlinear) methods will do. If what’s needed is to simultaneously examine a small subset of concise and “reasonable-looking” models, and also to fit the parameters of those, then parsimonious approaches like ParetoGP and FFX are the right choice.
It isn’t just numerical methods, either. This approach of starting from pretty-good solutions and searching locally for “improvements” is what stokes the growing interest in Genetic Improvement, I think. “I don’t want you to reinvent the whole, I just want you to focus on and fix this little piece.” The GI folks don’t talk much about writing code from whole cloth, but rather on discovering fixes to a few failing unit tests, against a background of mostly-passing ones.
Think of those “self-driving” juggernauts that are on the street right now. Think of the internal controller for your pacemaker, or the automated trading system managing your “retirement” “savings”, or the semi-autonomous Mars robot that you don’t want to leap off a cliff chasing a butterfly. There are many situations where we don’t want to be surprised.
But then sometimes we do.
Pandora’s Black Box
As you might have guessed, I like to talk about the other thing.
I make loving fun sometimes about people who search for “general artificial intelligence”, but not because I think their mission is silly or impossible. Rather I make fun of them for trying to make things that look on their surfaces the way we imagine human beings “actually are”, which is a silly and short-sighted argument for an otherwise reasonable mission….
BILL
[waving fingers vaguely]
Neural networks, someday you will be a Real Boy.
I tend to make fun of people seeking “general intelligence” by incremental approximation of “real intelligence” because they are imagining that the bug-fixes of Genetic Improvement is analogous in some reachable way to writing whole programs from scratch. Making that sort of claim is dismissive to the other, non-coding work that programmers do: the social work, the design work, the lived experience of the Mangle of Practice they are dropped into when they wrestle an idea into some semblance of working code.
They say humans are “generally intelligent”. I suppose; I guess; sure, sometimes. Let’s look into that a bit.
We like to imagine that “we” (whatever that is) are “capable” of addressing many different kinds of problem. Now and then we come up with innovative and unusual approaches, of “inventing” new things and approaches ourselves. Those inventions are startling, in their way, though (you’ll notice) only once we start to understand how they work.
But when we do invent, it’s usually not by composing entire new representational ontologies, nor by building whole new compiled languages, nor with new mathematical frameworks… just to open the stuck pickle jar. We (the meat “we”) tend to use the same basic tool box for a lot of our work: these 32000 or so Unicode characters, a small pile of only the simplest digital computing logic gates, a few pounds of fatty meat. We tend to think of going from Point A to Point B only by car, or plane, or bus, or foot. We expect new stuff to be a lot like the old stuff we have already.
This isn’t a complaint. I’m just reminding us both.
Despite how infrequently we humans tend to do it, we like to think “exploration” to find “novelty” is inherently good in some situations. The vast majority of our lives is spent on what John Holland would’ve called “exploitation” (of well-formed nearly-good solutions), but now and then we need some “exploration” (of weird blue-sky shit) to knock us out of our comfort zone. That’s the argument, at least.
But we don’t tend to approve in practice of the really weird shit, do we? I mean here the paradigmatically weird shit, of the sort you only really learn when you take advanced Biology classes: the swarm intelligence of termite colonies doing “engineering design” of their mounds, the hydraulic actuator arms of cephalopods (covered in many cases with distributed “brain”), the invisible subsurface mycorrhizal network and aerial signals that connect a forest into a biochemical community sensing and responding on nonhuman spatial and temporal scales….
We’re bad at doing aliens. We only engage things that are like what we already know.
We can do “biologically-inspired”, of course. We have soft robots and swarm algorithms and ubiquitous mesh networks now, and all those have at some point drawn inspiration from the basic science of octopods and termites and ecosystems. But only after we noticed and then started to understand those natural systems.
So yes, we can sometimes copy alien things, once they aren’t quite so alien any more: once we start to tease apart how they work, what they’re meant to do, how to draw useful lines around their various parts, and what those parts do in service of the thing we think we’ve observed. We love Function, we think most of the time we need there to be function. Function is the language of not just engineering but science.
What is this for?
But I was trained up as a Biologist back in the day, at least until they kicked me out, before all this data-driven stuff we have now, when we had anecdote-driven observational biology… and we were happy. I’ve spent a lot of time looking at the biochemical pathways; not that lame-ass dainty little Krebs Cycle like they taught you in school, but plant secondary metabolic networks, and fungal sporulation regulatory networks, and the places where DNA transcription works backwards.
They used to say, “The central rule of Biology is that everything has an exception.” And they were right, except not just Biology as it happens….
One of the (possibly specious) distinctions people tend to draw between Engineering and Science goes something like this: “Engineering is about what things are for, and Science is about what things are like”. That is, people of our era tend to think of Engineering as being a fundamentally problem-driven undertaking, where [first] a need demands a solution, and [then] a solution is built according to practically-constrained reasoning.3 And the same people tend to imagine Science as “looking at the world” to gather a body of observations, and from there formulating a big-H Hypothesis, and from there going on to Do Experiments to show how the world probably is or is not. The world ends up being like a certain thing, and from there… well, from there, you look at more stuff and try to get more ideas about what the world is like.
This grand dichotomy tends to leave Mathematics and Art and the Humanities aside, I note pointedly, and you might wonder a little bit in your spare time if maybe there could be a problem in that.
Let’s look at some anomalies, some bugs in this worldview as it were, and consider how we might maybe come up with a bugfix or two.
Consider Number Theory. Back in the day, Number Theory used to be a very abstruse thing, filled with insanely esoteric conjectures about prime numbers and weird gigantic numbers nobody could even write down in standard nomenclatures, and combinatorial objects and vanishingly small probabilities, existence proofs of things nobody could ever want, and so on. It was kinda Sciencey, in this regard: it was about what numbers are like, but then on top of that of course there’s that whole philosophical thing about why the world cares one whit about numbers in “real life”. So it was also tainted (from an Engineering standpoint) with Philosophy, a heinous taint indeed when it comes to Engineering and problem-solving, it seems.
Except of course this was all in the days before your refrigerator had a privacy policy, and before there were satellite constellations telling your telephone how far it was from the center of the earth, and back when banks were geographically-local entities that stored their accounts in ink made of physical molecules printed on pulverized trees and old cotton rags. It was before cryptography became Number Theory, when there wasn’t even an infrastructure on which such notions could be tested, let alone released as business models.
Things become useful. Contexts change. The conversation with the world goes on, and now and then somebody goes back into the esoteric historical dialog and notices that Hey the World was telling us about large primes and factorization decades ago, did you know that? Look here in these old-ass books nobody ever read!
I am tempted to drop a half-dozen other examples here, but you know how that would turn out. The premise would get squishy. I’d be forced to start talking in sidebars about “well it was more complicated than that, really,” and pointing out how the basic glib story was reticulated and anastomosed all over the place. Because of course the stories would start to be about how Charles Babbage made computers… but didn’t. Or how magainin antimicrobial peptides were discovered by a fluke of staring off into space at surgically-wounded frogs in a crap-filled aquarium who refused to get infections… but were then failed to be commercialized. Or this or that or whatever.
In my hands, such a list would end up being another back-asswards introduction to the Mangle of Practice, Andrew Pickering’s lovely metaphoric model of how science—and engineering, which isn’t as different as you might think—is actually done and experienced. So maybe I should write that introduction forwards-frontwards instead.
I will do that. Next time.
But let me first finish my thought, the motivator, the thing that makes me fill this page with a few thousand words over the course of a long weekend:
Think of both Engineering and Science as conversations. Both make things, and those things we’ve made are what I want you to imagine we are talking to. The dialog is one where we, as Engineers or Scientists or Artists or Historians (the difference will disappear shortly) tell the thing we’re making, “I believe X about the world, and I am making you so that you——,” which is where the thing we’re making interrupts and says to us, “Nope. That’s not the way the world and I and you work. The materials from which I’m made, the things you want me to do, the constraints on my behavior and the limits on my performance (brought to bear by physics and chemistry and the nature of reality itself), that’s a huge nope, and that’s all I’ve got to say. Nope.”
The things we make resist our expectations.
If we are listening, and we are committed to making something—whether it’s a machine or a theory or a work of art or a consistent account of historical events—we should at that point adapt to suit the resisting nope. If (to use Pickering’s example) we thought glass and optics worked in some way, and the telescope we’re making says to us, “Nope I will show you blurry things, distorted and headache-inducing,” this is useful information for us to listen to. Maybe our sense of the machining tolerances are wrong, maybe our sense of what “good glass” is for this task is wrong, maybe we need to revise our model of light, maybe there is something in our model of light that we can use to understand why the edges of the things in the telescope are all surrounded by Roy G Biv rainbows instead of being clear like our eye might see them… oh wait we say, that’s interesting….
Imagine the things we’ve made have voices, that they resist us on behalf of the way the world Really Is. Imagine too that they are not just talking to us when we are actively concerned with them, when we’re writing code and they are showing us flaws in our reasoning, or when we’re collecting data for models and they are talking about anomalies and outliers, or when we’re writing about a Great Man of History and the thing (the history) we are writing gestures blithely over towards the pile of letters from friends and relatives sitting accusingly in the corner of the archive, saying what an asshole the Great Man really was.
Not just then. Imagine them talking, representing, signifying their contrary versions of the world all the time. But we’re bad listeners, and we have short memories compared to physics and such. A lot of what the things we make say aloud doesn’t seem very useful to us at the time.
But all we’ve made and will make is constantly resisting our expectations. Sometimes more quietly, sometimes more loudly. We tend to ignore what’s not fighting us in the moment, but it’s still there, muttering. Now and then—as in the example of Number Theory, but also in the examples I elided—maybe it can be useful to look back, look far away from what we’ve been examining closely, and wonder what it’s been telling us all along.
See, I laugh at people looking for “general intelligence” in artificial systems that are designed by incremental approximation of “real intelligence” because it’s been here all along, in the discussions we have with the things we’ve always been making.
The other way to use GP, the one that isn’t parsimonious, incremental, White Box explainable modeling—see, I said we would come back to it—is not as a tool to “make general intelligence”. It’s as an amplifier for the Mangle of Practice, this argument we have with the world. It’s an interview process about the world, a survey technique that can help us sample a little bit from the vast combinatorial pile of the ways in which that world—represented by the things we’ve made, and which resist on its behalf—is saying something different from what we imagined was true.
It’s a way of making aliens. The “general intelligence” isn’t in the aliens themselves, nor is it in us. It’s in the dialog we undertake with one another, as we try to formulate the general principles of some particular task, and the things we’ve made say, over and over: Nope.
The biggest complaint I hear from the folks who use GP in White Box situations? When they show the things that have been discovered to an Expert in the Discipline, that Expert will say, “I would never have done it that way.” Sometimes, in the throes of a long and important conversation intended to make something useful, that is the right response: Go back to the simpler stuff you understand, and don’t drop down into basic science if you can make do with simple, linear approximations to the real world.
But other times, the right thing to do is to say the next sentence: “I would never have done it that way. I wonder why.”
-
That word “only” is in scare quotes because it’s doing an awful lot of work in that sentiment. Asking an artificial system to do no more than what we expect is itself a matter of some effort. ↩
-
Actually I have some thoughts on this, since Trent wrote this algorithm about the same time I was going back to school at the University of Michigan to start a PhD in Industrial & Operations Engineering. The habits of linear programming and integer programming and basic nonlinear methods are almost exactly like the ones I’m talking about in the GP world, but moreso. People in most engineering fields have been raised up learning numerical methods that were resource-constrained in the 1970s sense of the phrase, and at the same time they were living in a world where certainty of optimality was demanded. The lower-left corner of complexity-vs-error space was even smaller for them, and more vehemently demanded by the discipline. You only tried something more complicated than Linear Programming when you’d shown it failed, and you used that because Great Men had proven it would always give you the optimal solution if it did work…. ↩
-
Just the other day at our GPTP workshop, there was a lot of discussion of how GP might be falling into the sin of “building solutions looking for problems”. ↩