The Three Languages of Genetic Programming
Draft of 2012.05.22 ☛ 2015.03.17
[This is a draft of an introductory chapter of the book; expect some changes as I finish up the first partial release. Also, I note some links and cross-references don’t translate to the blog directly.]
I like to say that success on GP project involves working in three languages. Look again at the 3×5 card, and you’ll see hints of them all there. I call them the Language of Answers, the Language of Search, and the Language of the Project.
The Language of Answers: What?
A GP system itself doesn’t “think”. It’s a system for accelerating the exploration of alternative answers to a formally-stated question. A single project will often shift to explore several different questions: matters of whimsical open-ended curiosity or earnestly dedicated purposive science. But you should pay attention to only one at a time.
To be able to treat any particular question using GP, and explore the vast number of diverse alternative answers, you will need to write code that embodies your interests and goals.
Setting up a GP project obliges you to do some coding work. Usually the majority of your work will be the design and implementation of a domain-specific language. It doesn’t have to be very complicated, but it will need the flexibility and capacity to describe any interesting answer to your project’s question of the moment—the “smart” answers, and also the “dumb” ones. After all: your problem is interesting because you don’t know which answers are smart and which dumb….
I prefer to call the scripts you write in this domain-specific language—as interpreted in the context of your problem—“Answers”. In the GP literature you’ll see them called “individuals” and “genomes”; those are historically important terms, but they carry a lot of potentially misleading metaphorical baggage. Here we’ll stick with the term Answers, to remind us that they are contingent on the problem you’re considering.
You’ll have seen folks listing some of the potential application of GP, talking about evolving “programs” and “strategies” and “puzzle solutions” and “molecules” and “controllers” and “robots”… all kinds of complex actual things. As language-using humans, sometimes we mistake our representations of concepts for the concepts themselves. Remember: a script doesn’t do anything until we run it on a particular interpreter or compiler, and even then only with certain variables bound to meaningful values.
A strategy is a meaningless poem until you invoke it in the context in which it was conceived; we cannot meaningfully read a “pure strategy” without knowing what war or game or business it was meant for. A real molecule is not a string of “ACGT”, or even a pretty colored picture of little candy balls on sticks, but nonetheless when we “evolve molecules” we’re evolving balls on sticks or strings of letters… then interpreting those in some molecular simulation. A controller for a robot is only a string until you upload it to a physical robot or a simulation so you can see what might happen when it runs. A plan for trading stocks is meaningless—and risky—without also considering the particular historical context and the specific trade execution system for which it was developed. And so on. An Answer needs both pieces of infrastructure: a statement or script written in a domain-specific language (often one you design), and also a formal setting in which the function embodied in a script can be expressed and explored meaningfully.
If the Answers in your project are simple DNA sequence strings like
ACGTCTAGCA..., you’ll also need to obtain (or write) a simulator that translates those strings into proteins, or folds them, or tests them for toxicity, or does whatever a computer needs to do in order to determine the salient aspects of their function. If you want to evolve robot controller scripts, you’ll need a real or a simulated robot that can execute your controller scripts and reveal their function.
This is true even for the simplest and most common application of GP1, symbolic regression—fitting mathematical functions to training data. The most common approach is to represent these mathematical equations as S-expressions, a form familiar to many Computer Scientists who learned to program in Lisp. For example,
( + x ( / 2 9 ) ) is an S-expression representing the function .
But notice that the S-expression script
( + x ( / 2 9 ) ) is not in itself the mathematical function (unless you happen to be running a Clojure interpreter in your head or something). Even though it’s very close to the runnable code, it’s not fully an Answer until you express it by parsing and evaluating its output value in an interpreter—one in which has a number assigned to it.2
Even when there’s a “general-purpose” GP-ready full-featured language available—something like Clojush or even a human-readable language like Java—you’ll usually need to expand it with libraries or custom code to include domain-specific vocabulary. And for reasons we’ll discover in the first project, sometimes when you use a full-featured language, you’ll also need to trim back its capacity.
Focus for a moment on the phrase “domain-specific” and how it needs to cut both ways: You don’t typically find
for...next loops or set-theoretic operations in symbolic regression projects, because people are asking for arithmetic Answers, and those people rarely see
for...next loops in arithmetic. You can fit data algorithmically using loops and Boolean operators and bit-shifting—after all, that’s how computers themselves do it. But you won’t find a
shift_right operator in most off-the-shelf symbolic regression packages, because the Answers that arise which use it to explore the problem would “feel weird”.
If you’re working on a project where you want to explore string-matching algorithms to classify DNA into genes and introns, your Language of Answers will probably include something about regular expressions. Not a lot about
If you’re working on a project where you want to explore game-playing algorithms for a text-based dungeon crawl, your Language of Answers will probably include primitives like
fight. And maybe if you’re fancy, you’ll roll in a library for creating decision trees so your adventurer can learn. But again, not a lot of
cos() happening in the ol’ Crypt of Creatures.
And just to prove I’ve got nothing against trigonometry as such: If you’re working on a project where you want to explore the set of plane geometry diagrams which can be constructed using a straight-edge and compass, you will almost certainly want some
cos() floating around in the mix.
So GP is “automated” how exactly?
No escaping it. In almost every GP project, you will need to hand-code this Language of Answers. Both parts: not just the “scripts” but also the contextualizing system used to interpret scripts and express their functions meaningfully.
Does this seem like a lot of effort? It’s not, when you put it in perspective. Realize that when you explore a problem with GP, you should expect to examine millions of alternative Answers. In traditional approaches to problem-solving, you might (if you’re Ever So Smart), be able to consider a few dozen—the ones you can keep in your head and notebooks. Even if you use algorithmic tools like linear programming, realize they are parametric explorations of different constant assignments… within one Answer at a time.
If you want access to the millions instead of the dozens, you need to put in the up-front work to programmatically represent the structure of Answers, and also hook up the mechanisms needed to express them functionally. That’s the investment you make.
The Language of Search: How?
“Language of Search” is my catch-all for the innumerable tricks of the GP trade. I count anything that changes the subset of Answers you’re considering, including random guessing and assigning them a score based on their performance in context.
There’s all the familiar biologically-inspired stuff like crossover, mutation, selection, and the more fanciful manipulations. And also the idiomatic tools we use to implement learning or evolving or improving: populations, back-propagation, selection, statistical analysis, 1+1 Evolutionsstrategie…. Basically anything and everything that reduces the amount of personal attention you need to pay to all those alternative Answers.
There is no particular “right way” to use or combine these components. They’re really all design patterns, and they are used differently in different geographical regions and schools; they are most like the mythic martial arts styles you see in movies, and the particular moves one school or Master may teach his students. But just as the martial arts share a purpose (if not an attitude), the many parts of the Language of Search address one question: Based on what you have discovered already, how do you identify new Answers that will be more satisfying?
Every GP project uses selection in one form or another, so let’s look at that more closely. Say we’ve built a GP system with a population of 100 Answers, and we want to design a process to pick “parents” in order to breed a new generation. There are literally hundreds of approaches, but here are four. We might:
- …pick two parents with equal probability and remove them from the population; breed them to produce two or more offspring; keep the two best-performing family-members (including, possibly, the parents), and replace those winning family members back into the population.
- …pick two parents randomly from the population, using a bias towards better-scoring ones; breed those two parents to produce one offspring, and set it aside in a new “generation”; continue (with replacement of parents) until you have as many in the next generation as you did in the last.
- …pick two parents at random from the population, with uniform probability; breed them, and return the parents and the offspring to the population; continue until the population size is doubled; destroy half the population, culling it back down to the size where it started.
- …pick ten different individuals from the population with uniform probability; choose the best one of that tournament to be the first parent; repeat for the second parent; breed, and then… (&c &c)
These are all perfectly reasonable and practical ways of choosing answers to breed and cull from a population. Three of them have formal names, even. Occasionally one may feel “better” than another for a given project, but none is intrinsically better in all situations.
My point in listing them is to highlight the obvious fact that they’re all just recipes in a formal language: the language I’m referring to as the Language of Search. The “primitives” in this language are things you can surely see in my verbal descriptions: evaluation, subsetting and sampling, breeding (itself a whole blanket process that usually refers to “mixing up Answer scripts with one another”)… and of course the basic programming infrastructure of iteration and conditional execution and sorting.
All normal computer programming stuff, though maybe a bit more stochastic than you’re used to. But note that the Language of Search isn’t limited to code: There’s an important class of GP systems known as “user-centric” or “interactive”, in which a real live human being makes conscious decisions as part of the algorithm. This is a valuable tool for exploring matters of aesthetics and subjective judgement. (And we’ll build something like that in a later project.)
The Language of Search is huge, but it’s not onerous. While you almost always need to design and implement your project’s Language of Answers, even the most “advanced” tools in the Language of Search toolkit are simple in comparison. Things like “chop up a string and mix up the parts” or “change a token in a script to a random value” or “assign a score to an Answer by running it in context, given specific input conditions”.
When I keep saying GP is simple, that’s what I mean: the Language of Search is simple. It’s really just a big catalog of small parts you cobble together, and there’s absolutely no reason you should try to learn all the tools anybody has ever tried, or use more than three or four basics in a given project.
And that would be your cue to ask: Why then does GP have a reputation for being so hard?
I’m glad you asked….
The Language of the Project: Why?
Almost all GP writing focuses on the Language of Search, either spelling out new tools and algorithms, or having little benchmarking contests between variations. A bit of the writing—mostly theoretical Computer Science—touches on the Language of Answers under the heading “representation theory”.
As far as I know, very little has been written about this stuff I’m calling the “Language of the Project”. Yet I argue it’s the most important of the three—not least because it’s the deciding factor when it comes to predicting whether a project will succeed or fail.
The Language of the Project is the language we use to talk about ourselves, in our roles as part of the project. It’s the framing we use to express what we want, and why. It’s our expression of the reasons one Answer is more satisfying than another, and our consideration of the possibility that no satisfying Answer exists. It’s the language we use to process the surprises GP inevitably throws our way.
Big chunks of my Language of the Project fall in the realm of well-studied disciplines: “user experience”, “project management” and “domain modeling”. Why do I feel it’s important to concoct a catch-all neologism just to lump together those esteemed fields for this special GP junk? Worse: why is a technical computing book about “artificial intelligence” getting all touchy-feely and psychological?
Simple answer: Because people don’t like being surprised.
That may ring a bell, since when you check you will see that the subtitle of this very book is “The Engineering of Useful Surprises”. And I specifically argued earlier that GP is “a prosthesis for accelerating innovation”—innovation in the sense of surprises.
Yup. And that’s the biggest obstacle in the way of broader adoption of GP, and also the biggest obstacle you personally will have working on your own projects: People don’t like being surprised.
You and your habits
A lot of folks seem to have decided that GP is “automatic”; that it’s used for “automatic search”, or “automatic programming”, or building “invention machines” that spit out inventions that are of “human-competitive” quality. Those folks won’t think my third Language is worth their attention.
To them, GP—and artificial intelligence more generally—is a sort of self-contained box of magic thinking stuff. I wonder if maybe those people have read post hoc reports of successful GP (or AI) projects, without considering all of what happens over the course of an actual project: a lot of non-artificial human thinking, typing, compiling, swearing, whiteboard-scribbling, and conversation… filtered through a series of iterative programming attempts and arguments and writing, until eventually an encouraging result was published. If you don’t count that as part of the project, then of course you shouldn’t think a GP (or AI) system includes the project team rewriting the algorithms, or the planning sketches, or the conversations and reading, or the re-starts with different settings to try to get more consistent results, or the statistical analyses trying to “tune” or “speed up” the thing, or even the story written down in the paper that describes what happened.
And if you’re willing to draw the boundary around the system that way, in a way that leads you to think GP (or AI) is a self-contained magic box of thinking stuff that people stand in front of and pat and hug and eventually coax intelligence out of… well you ought to get started now, because time’s-a-wastin’.
But while you’re occupied in patting and fostering self-organized creative urges, muse about it my way for a minute.
Recall that the Language of Answers is something you will almost always build from scratch. It’s not just domain-specific, it’s often problem-specific. The only time you can get away with using a pre-cooked Language of Answers is when you’ve unconsciously selected a problem that makes it easier to stomach reuse, or reduced the domain-specific qualities to raw numbers and
Given that reminder: How do you design the constants, variables and operators to use in your project’s Language of Answers? Which instructions will be more helpful in making interesting Answers? Which will be too weird? How do you ensure every Answer will be syntactically correct, or semantically consistent? Or do you have to? How do you know whether your Language of Answers is capable of representing any satisfying Answer at all, let alone an “optimal” one? How do you tell and what do you do when your GP system is ignoring important tools you want to see it use?
Those are questions from the Language of the Project. No matter where you draw your system lines, a person needs to ask and answer these questions. Every time, for every project, for every problem. And a person needs to design and implement the solutions to them, using the other tools at their disposal. None of that is “automatic”.
And you may also recall that the Language of Search is a bulging toolkit, full of literally thousands of design patterns and rules of thumb for manipulating answers in context-dependent useful ways. I can describe sixteen mutation algorithms without breaking a sweat; then you’ve got crossover, and simulated annealing, and steady-state population dynamics, and demes, trivial geography, hill-climbing, initialization biasing, multi-objective sorting, particle swarms, automatically-defined functions, vertical slicing, age-layered populations…. Any riffle through any GP book will give you fifty more.
Given that reminder: How do you pick the mechanisms for search and learning in your project? How do you know which combination may be best or even useful for your problem? What do you even watch in order to decide whether a GP search is “working” or not? Should you let your current search run longer, or start it over again? If you start it over, should you change the parameters a bit, or try a different design pattern? What do you do when it gives you an answer that “solves the problem” in a totally stupid way?3
A person needs to mindfully adapt the structure of the project to fit the dynamic context of their wants and knowledge, and manage the system into giving them the answers they will find satisfying.
My “Language of the Project” isn’t identical with user experience, or project management, or domain modeling, or even their union. Those disciplines are admirable, but they are designed for unaccelerated human-powered projects.
“Excuse me: What just happened?”
You write software. I know this, or you wouldn’t bother reading this far. If a project isn’t giving you satisfying answers—whether it involves GP or not—then you (personally) need to check that it’s implemented correctly. And when you’re convinced it is running as intended, you then (personally) need to reflect and decide whether it’s doing what you want it to. And if you decide that it isn’t, then you (personally) need to either change how it’s written, or change what you think it’s for.
In non-GP projects—software development or financial or home improvement or medical research projects—there’s a reasonable sense that one can “re-start”. But of course in the context of human-powered projects, “re-starting” is never misunderstood to mean “from the same initial conditions”. You (personally, with all the other human beings on your team) “re-start” having learned something useful and helpful. You intend to do something differently the second time around, and you don’t have to concentrate very hard on remembering to change stuff.
This difference between you-before-the-first-try and you-after-the-first-try doesn’t get mentioned, because it’s such a fundamental fact of life that it goes without saying. But notice that you (personally) are understood intuitively to be part of the problem-solving system before and after the “re-start”.
Just the other day I was working on the code for a later section of this book: the part where we will evolve Conway’s Game of Life. I found that the GP system I started with was having a lot of trouble producing interesting answers. I worked a few days, trying to get it to do what I expected.
And then I realized that it had been working the whole time. I mean totally working. It gave me the best possible answer, every time.
Only then did I realize that the question I was asking was super boring. There was only one right answer, and the GP system I built kept giving me that answer. Immediately.
Now if you are one of the folks who want to think GP is a self-contained box of magic thinking stuff, this might seem like a good outcome, and not a problem. Who wouldn’t want an “optimization algorithm” to give them The One Right Answer?
Well, me. And you, I expect.
I would sound like this, if I were on stage at the Amazing Answer Machine Show: “Ladies and Gentlemen, I am thinking of a special algorithm! I have provided this, The Box of Magic Thinking Stuff, with 512 carefully-chosen examples and a collection of useful tools, none of which in itself is the algorithm. By recombining those tools in a very complicated way while I stand over here, The Box will now guess the function I’m thinking of in a matter of mere moments….”
A card trick. Boring.
What did I do then? I revised my notion of the project’s goals. I “re-started”, and in doing so I changed the story I’d been telling myself, the questions I was asking, and I expanded the Language of Answers accordingly.
The answer my GP system gave me was a surprise. One I wasn’t mentally prepared to understand, not least because it happened in a matter of seconds where I was expecting it to take some time. When I finally parsed what it kept repeating, I had a second surprise: the question I had asked was boring.
If I had been working in a traditional unaccelerated way—with a whiteboard or a yellow legal pad, chewing on the end of a pen and pacing with my hands behind my back like a think-tank caricature—I might have frowned and erased some stuff, or crumpled up a page or two and made a cup of tea.
I wouldn’t have been surprised.
Introspection is hard. Most people, for whatever reason, don’t like to question their assumptions. They like certainties and provable correctness, familiar models and known best practices, mathematical rigor presented on a buoyant comfort-cushion of assumptions.
That’s what I mean when I say they don’t like to be surprised.
Surprises aren’t just pleasant eureka moments, they’re also the oh shit moments. GP can be useful as an “innovation prosthesis” because it shortens the time between those eureka surprises.
GP feels complicated and difficult and annoying because it also shortens the time between oh shit surprises. And it can’t tell the difference.
GP projects often fail because novices run into oh shit surprises before any eureka ones. They’re culturally maladapted to cope with this disorder: they’re often Very Smart Computer Scientists or early-adopter domain experts, and they can pick up some information from the books or the nerds down the street, and they start dabbling in what I’ve called the Languages of Answers and Search.
But nobody ever tells them about these inevitable oh shits.
I going to focus on this cobbled-together “Language of the Project” exactly because of those issues. I’ve watched dozens of Very Smart engineery people dive in and (metaphorically) drown in GP. We need to erase the traditional boundary between what you think of as “the project” and you (personally), the “researcher”.
This is not to advance some Agilist social agenda, but rather as a coping mechanism. Your best and most useful habits as a Very Smart Person are based on your experiences thinking very hard and hand-coding solutions to problems one at a time, and considering a few dozens of alternatives. Without any thousand-fold enhancement.
I see it often: Smart person downloads some package; writes some code; follows along with a tutorial and builds a GP system and—boom—it starts spitting out ten thousand reasonable-sounding solutions every hour. Already they’re way outside the range of what their habits prepare them for. But they’re Very Smart, and so they look at the answers they have so far, and they fiddle with some things and change some parameters… and—boom—in an hour they have ten thousand completely different answers.
When it works, answers emerge from a GP system, in the sense of emergent behavior. Good Answers and bad ones. But realize they can’t emerge from a GP system of the sort I’m teaching you about—the sort that includes you (personally) as one of the components—until you (personally) examine those Answers and eventually decide you’re satisfied. You can’t succeed unless you can cope with the acceleration.
Here’s one of the core questions in GP (and AI) research, a deep and troubling one that many man-years of research have been spent considering: How do you know whether you should (a) keep a GP system running, on the off chance it will get better soon and give you new unexpected answers, or (b) stop it and start over from different initial conditions?
If you think GP (and AI) is a self-contained magic box of thinking stuff: You don’t.
If you realize you’re a core component in the GP system: Pick the one that is more satisfying to you at the moment, and try the other if that doesn’t work out.
And here is a deep-rooted problem affecting all of search and optimization, not just in AI but all computational approaches: How do you know a priori which search technique will provide reliably better answers for a given problem?
If you think of the program as a self-contained box of optimization tools (and magic thinking stuff), the proven4 answer is: You can’t.
GP is simple. Regular old human-scale problem-solving is hard enough that people will tell you you’re a Very Smart Person if you demonstrate even occasional competence. But coping with a thousand-fold acceleration will break your model of yourself and what you think you’re doing.
So. Let’s start breaking.
So common that the old Wikipedia page for Symbolic Regression now redirects to the one for Genetic Programming. Am I allowed to put a “facepalm” in a book? ↩
I worry there’s a bit too much subtlety here: In some projects, an Answer may well be a formal function that is not evaluated with variable assignments—a project involving algebraic transformations, for example. It’s the goal of symbolic regression to fit particular training and test data; assigning those particular values is part of interpreting an Answer in that context. ↩
Let me share a symbolic regression result I was given by a system I was testing. I was just putting it through its paces, and so I was looking for functions that fit ten sampled data points from . It came up with the perfectly reasonable answer that started with and went on for four more lines after that. When I simplified it, it meant the same thing as , although along the way it added seventeen constants together, multiplied them by 166, and divided by a huge number to multiply some extra terms by
0. This was the sort of surprise I mean. ↩
This is an important result, and it pisses people off because it challenges some of the same models of self and project that I’m calling into question. It’s called the No Free Lunch Problem for Search and Optimization. Among other things, it demonstrates that for any performance criterion you can develop, the average performance of any search algorithm—over all problems—is no different from the average performance of any other algorithm. ↩