Draft

# Permutable programming languages

Draft of 2017.12.07

May include: genetic programming&c.

If you follow the RSS feed, you may have noticed that I’ve been clearing out some small puzzles and open questions lately, like the thing about integerstick graphs or the one about the Traveling Salesman with a GPS. I haven’t given up on the domino tiling thing either; I’ve got a few more short pieces to clear off my desk before I get back to that, since there’s a good deal of coding I want to do in order to transform it into a benchmark for GP.

But there’s never just one thread, with me.

I recently proofread a paper on Geometric Semantic Crossover, and it got me ta thinkin’ (as one says).

First of all, we’re in the world of genetic programming here. Probably way more than 90% of the people who even know about genetic programming will think it’s about evolving tree-shaped expressions, and they’ll probably think that “crossover” is a smart way of taking a chunk of one program (tree) and snipping it out of the donor and replacing some chunk of another program (tree) with that chunk as a new subtree. This is how Koza Himself described it, and few folks indeed have thought about changing it much.

This is a good place for an example. So let me write out a couple of trees, in the sense most people (see I keep repeating that phrase, so it starts to sound like foreshadowing I hope) think of them, and then draw a couple-three purdy pitchers.

Say we have two functions mapping $$\mathbb{R}^2 \rightarrow \mathbb{R}$$. That is, they’re functions of the form $$y=f(x_1,x_2)$$. Now the Koza-style innovation from 25 years back is to write these as Lispy S-expressions. Suppose we have

$f(x_1,x_2) = k_2 x_1^2 + k_1 x_1 x_2 + k_0$ $g(x_1,x_2) = \frac{k_9 x_1^2 + k_8}{k_7 x_2 + k_6 x_1}$

We can rewrite these as equivalent S-expressions like this, where I’m writing in a fakey kind of “normalized” form, where each operation has no more than arguments:

f = (+ (+ (* k2 (* x1 x1)) (* k1 (* x1 x2))) k0)

g = (/ (+ (* k9 (* x1 x1)) k8) (+ (* k7 x2) (* k6 x1)))


Now those S-expressions are, to a Computer Sciencer, trees. And “crossover” in the Koza-style tree representation world involves picking any subtree from either one, and swapping those.

Notice that syntactically this is feasible only because every subtree (atom or expression) returns the same type of thing, so any (+ A B) is just the sum of anything that fits for A and anything else that fits for B.

We can subdivide f in several places, plucking out self-contained subtrees willy-nilly. Here, I’ll walk through all of them and replace each with an A

f = A
f = (+ A k0)
f = (+ (+ A (* k1 (* x1 x2))) k0)
f = (+ (+ (* A (* x1 x1)) (* k1 (* x1 x2))) k0)
f = (+ (+ (* k2 A) (* k1 (* x1 x2))) k0)
f = (+ (+ (* k2 (* A x1)) (* k1 (* x1 x2))) k0)
f = (+ (+ (* k2 (* x1 A)) (* k1 (* x1 x2))) k0)
f = (+ (+ (* k2 (* x1 x1)) A) k0)
f = (+ (+ (* k2 (* x1 x1)) (* A (* x1 x2))) k0)
f = (+ (+ (* k2 (* x1 x1)) (* k1 A)) k0)
f = (+ (+ (* k2 (* x1 x1)) (* k1 (* A x2))) k0)
f = (+ (+ (* k2 (* x1 x1)) (* k1 (* x1 A))) k0)
f = (+ (+ (* k2 (* x1 x1)) (* k1 (* x1 x2))) A)


Those are all the subtrees (including the trivial root) of f, and if you want to see the torn-out hunks here those are:

f = A
A = (+ (+ (* k2 (* x1 x1)) (* k1 (* x1 x2))) k0)
f = (+ A k0)
A = (+ (* k2 (* x1 x1)) (* k1 (* x1 x2)))
f = (+ (+ A (* k1 (* x1 x2))) k0)
A = (* k2 (* x1 x1))
f = (+ (+ (* A (* x1 x1)) (* k1 (* x1 x2))) k0)
A = k1
f = (+ (+ (* k2 A) (* k1 (* x1 x2))) k0)
A = (* x1 x1)
f = (+ (+ (* k2 (* A x1)) (* k1 (* x1 x2))) k0)
A = x1
f = (+ (+ (* k2 (* x1 A)) (* k1 (* x1 x2))) k0)
A = x1
f = (+ (+ (* k2 (* x1 x1)) A) k0)
A = (* k1 (* x1 x2))
f = (+ (+ (* k2 (* x1 x1)) (* A (* x1 x2))) k0)
A = k0
f = (+ (+ (* k2 (* x1 x1)) (* k1 A)) k0)
A = (* x1 x2)
f = (+ (+ (* k2 (* x1 x1)) (* k1 (* A x2))) k0)
A = x1
f = (+ (+ (* k2 (* x1 x1)) (* k1 (* x1 A))) k0)
A = x2
f = (+ (+ (* k2 (* x1 x1)) (* k1 (* x1 x2))) A)
A = k0


That starts to make my eyes hurt. But the point is we have computers to do that sort of thing for us, so now any of those A chunks from f can be swapped into some equivalent hole we make in g. Let’s take this one: A = (* k1 (* x1 x2)) and stick it into slot B in this: g = (/ B (+ (* k7 x2) (* k6 x1))). We can also stick the chunk B into slot A if we want, producing the completentary crossover product.

As a result, we get some new

h1 = (+ (+ (* k2 (* x1 x1)) (+ (* k9 (* x1 x1)) k8)) k0)
h2 = (/ (* k1 (* x1 x2)) (+ (* k7 x2) (* k6 x1)))


or

$\require{color} f(x_1,x_2) = k_2 x_1^2 + \colorbox{yellow}{k_1 x_1 x_2} + k_0$ $g(x_1,x_2) = \frac{\colorbox{yellow}{k_9 x_1^2 + k_8}}{k_7 x_2 + k_6 x_1}$ $h_1(x_1,x_2) = k_2 x_1^2 + \colorbox{SeaGreen}{k_9 x_1^2 + k_8} + k_0$ $h_2(x_1,x_2) = \frac{\colorbox{SeaGreen}{k_1 x_1 x_2}}{k_7 x_2 + k_6 x_1}$

The point of these gyrations is (eliding a lot of contingency and history) that if we have discovered two good models of our real data—f and g, here—then we can hope that recombining “parts” of those models in a reasonable way will produce other good models, because the recombined models will in some sense be “between” the others. There is some debate (heh) on the validity of this notion, not least because these are whopping big hunks of nonlinear mathematics we’re talking, and also because “between” is not an easy thing to draw when they’re arbitrary abstract trees. But it works, to some extent. Possibly for the reasons hand-waved at above, and possibly because it’s a handy way of finding new stuff that’s completely unlike stuff you have but which might capitalize on having some of the same terms. “Between”… well, that’s a topic for another day.

Except it’s not.

So now: Geometric Semantic Crossover. This is a much more real-feeling version of “between”. Briefly… as long as we’re talking about functions of the this strongly-typed numerical kind, one thing we can do to look literally between two arbitrary functions on the reals is affine combination. That is, given our $$f(x_1,x_2)$$ and $$g(x_1,x_2)$$, we construct a new parametric function

$h(x_1, x_2) = \alpha_1 f(x_1,x_2) + \alpha_2 g(x_1, x_2)$

where

$\sum_\alpha = 1$

Now for some pair $$(0 \leq \alpha_1 \leq 1, \alpha_2 = (1 - \alpha_1))$$, we’ve got a new parametric equation which is quite literally the linear “morph blend” of the two we started with. We’re definitely, intuitively, “between” them.

This turns out to be a good idea, especially for times when we’re working on symbolic regression.

## Yes, and…

You are here, and you are therefore not in the “way more than 90% of the people who even know about genetic programming” I mentioned above. You know that we don’t have to work with S-expressions, you know that we don’t have to deal with arithmetic or strongly-typed number math Lisp. We can evolve general code, you and I.

We do that, sometimes, with Push.

Because at the moment I can’t find the blasted link to the good descriptions, let me review Lee Spector’s Push language, greatly simplifying for space.

Push is stack-based. That is, the program and the state are stored in a set of pushdown stacks.

Push is imperative. The interpreter begins with a program on a special stack called :exec. In each step of the interpreter, the top item is popped from :exec and interpreted.

If the item of interest is a literal, then it is pushed onto a different stack of the appropriate type. For example, if it’s a number, then we will push it onto the :number stack (which might be called :integer or :float or :scalar in various implementations, but you get the idea). If it is a boolean value (true or false), we push it onto the :boolean stack. If it’s a string, it goes onto :string, and so on. And yes, it’s a little more complicated than that, but not much.

If the item of interest we’ve popped off of :exec is a variable name, then we look up that variable’s value (the interpreter knows it) and push that onto the :exec stack. We’ll pop that next of course, and send it to the correct location on another stack because it’s probably a literal or something, right?1

If the item of interest is an instruction though, then we execute that instruction. Its arguments, if any, are popped from the appropriate stack(s), any side-effects happen immediately, and any results are pushed onto the :exec stack.2 So for example if the item of interest is the instruction :number-add, then we pop two :number items, add them, and push the result onto :exec.

This however is crucial: If any argument for an instruction are missing, it acts as a noop. So if we pop a :number-add instruction off of :exec, and there are no numbers on the :number stack, we just throw the useless thing away and move on.

When we run out of items in :exec, we’re done. By convention, we can think of the top item on a salient stack as the “result” of running the program.

An example would be good about now, right?

Now in the full scope of Push—at least these days—there can be a lot of instructions and types. Some of the instructions do some pretty weird things like shuffling around items on the various stacks, but I don’t want to derail us here. So let’s stick to the simple case of arithmetic and imagine we have two stacks; :exec and :number.

Let’s look at our old friends $$f(x_1,x_2)$$ and $$g(x_1,x_2)$$ again. We already know enough Push to deal with them. I will translate the S-expressions I showed above into one of the many Push versions, and we can walk through an interpreter running that program, step by step. Let’s also set k1 and k2 and x1 and x2 to some values, just for clarity’s sake:

f = (+ (+ (* k2 (* x1 x1)) (* k1 (* x1 x2))) k0)

Say k0 = 0
k1 = 3
k2 = -12.5
x1 = 17
x2 = 1.5e-2

:exec [:x1 :x1 :number-multiply :k2 :number-multiply :x1 :x2 :number-multiply :k1 :number-multiply :number-add :k0 :number-add]
:number []

; we pop :x1, look up its value, push that
:exec [17 :x1 :number-multiply :k2 :number-multiply :x1 :x2 :number-multiply :k1 :number-multiply :number-add :k0 :number-add]
:number []

; we pop 17, it's a number literal, so we push it there
:exec [17 :x1 :number-multiply :k2 :number-multiply :x1 :x2 :number-multiply :k1 :number-multiply :number-add :k0 :number-add]
:number 

; we pop :x1 from :exec; it's still 17
:exec [17 :number-multiply :k2 :number-multiply :x1 :x2 :number-multiply :k1 :number-multiply :number-add :k0 :number-add]
:number 

; again
:exec [:number-multiply :k2 :number-multiply :x1 :x2 :number-multiply :k1 :number-multiply :number-add :k0 :number-add]
:number [17 17]

; :number-multiply is an instruction, so we pop its arguments and push its result to :exec
:exec [289 :k2 :number-multiply :x1 :x2 :number-multiply :k1 :number-multiply :number-add :k0 :number-add]
:number []

; 289
:number 

; :k2
:number 

; -12.5 is a number
:number [-12.5 289]

; :number-multiply!
:number []

; maybe we can skip some of these?
:number [-3612.5 ]

; yeah I'm skipping now
:number [17 -3612.5 ]

;
:number [1.5e-2 17 -3612.5 ]

;
:number [0.255 -3612.5 ]

;
:number [0.765 -3612.5 ]

;
:number [-3611.735]

;
:number [0 -3611.735]

;
:exec []
:number [-3611.735]


I think I did that right. Maybe I should plug those values into $$f(x_1,x_2)$$ and see?

Nah, I trust you will. Anyway, let me point out a few things.

First, because I carefully (?) transcribed the function into “equivalent” Push code, I expect the thing on top of the :number stack to end up being the value of $$f$$ for those constant and variable values. But you’ll recall I said this was one of the many Push versions of this expression.

Here’s the Push program I started with:

[:x1 :x1 :number-multiply :k2 :number-multiply :x1 :x2 :number-multiply :k1 :number-multiply :number-add :k0 :number-add]


Suppose I had written this one, instead, inserting a five extra copies of :x1 at the front end of the program?

[:x1 :x1 :x1 :x1 :x1
:x1 :x1 :number-multiply :k2 :number-multiply :x1 :x2 :number-multiply :k1 :number-multiply :number-add :k0 :number-add]


Those five :x1 variable names will place five copies of the value 17 on the :number stack at the start of execution. But then immediately afterwards, the program I gave you above will run. None of the instructions in that program access the :number stack “below” the original starting point, so those five 17 values will ride along during the entire program execution. Assuming I haven’t bolluxed it all up, we’ll end up in the state

:exec []
:number [-3611.735 17 17 17 17 17]


As I said, if we follow the convention that the “result” of running a program is the top item on the salient stack, then the top item here is still the original result.

Here’s another variant, in which I’ve sorted the numbers and instructions into two separate “piles” in the program:

f = (+ (+ (* k2 (* x1 x1)) (* k1 (* x1 x2))) k0)

:exec [:k0 :x1 :x1 :k2 :number-multiply :number-multiply :number-add
:k1 :x1 :x2 :number-multiply :number-multiply :number-add]


Again, I hope I got that right. ¯\_(ツ)_/¯ But assuming I did, then that should be exactly equivalent to the first program, including its “immunity” to having things happen to the :number stack before it begins running.

There are several other ways, in the fuller Push language, where we could have even more functionally equivalent programs. There is the :number-swap instruction, which pops two items from the :number stack and replaces them in the reverse order. There is :number-dup, which pops one item from :number and pushes two copies of it. And so on. There are many ways to say the same thing.

But there’s something more important here. Remember that rule, that if we lack any arguments for an instruction, we act as if it didn’t exist?

Let me walk through another program, one that’s simpler, to show the point before I ask my take-home question:

:exec [8 11 :number-add -3 :number-multiply -3 :number-add]
:number []

:number 

:number [11 8]

:number 

:number [-3 19]

:number [-57]

:number [-3 -57]

:exec []
:number [-60]


This program produces the “answer” -60. But every other permutation of that program will also produce an answer of some sort. Possibly an empty :number stack… and so we’ll have to ask ourselves what kind of a thing that is. But try a few:

:exec [:number-add -3 :number-multiply -3 :number-add 8 11]
:number []

:exec [-3 :number-multiply -3 :number-add 8 11]
:number []

:exec [:number-multiply -3 :number-add 8 11]
:number [-3]

; :number-multiply lacks one argument
:number [-3]

:number [-3 -3]

:exec [8 11]
:number [-6]

:exec 
:number [8 -6]

:exec []
:number [11 8 -6]


Despite the fact that we encountered two instructions that lacked arguments, the Push execution rules state we carry on as if nothing had happened. So we end up with an “answer” in this case of 11.

Here are some more permutations of [8 11 :number-add -3 :number-multiply -3 :number-add], and the resulting :number stacks we get when we run them:

[8 11 :number-add -3 :number-multiply -3 :number-add] -> [-60]
[:number-add :number-multiply -3 :number-add -3 8 11] -> [11 8 -3 -3]


If you keep playing with these, you’ll find that there are several different outcomes… but not as many as you might expect just by counting the permutations as such. To some extent that may be because I’ve got two copies of -3 and :number-add here, but that’s not really enough to explain the apparent redundancies. And a little bit (in this example) is due to the fact that addition and multiplication are commutative, so we get some order for free.

But also notice that any program that ends in a literal or a variable name will define a constant function. Regardless of any other items that might end up on the salient stack.

## What does a program do?

Recall that we started, here, with a “translation” of f = (+ (+ (* k2 (* x1 x1)) (* k1 (* x1 x2))) k0) into Push. When we back-translate that Push program to an S-expression, we’ll get something very similar to what we started from.

But when we back-translate one of these permuted programs, we’re getting something more like a series of unlinked S-expressions, aren’t we? For instance, if I permute my Push version of f above into

f = [:x1 :x1 :number-multiply :k2 :number-multiply :x1 :x2 :number-multiply :k1 :number-multiply :number-add :k0 :number-add]

permuted f = [:x1 :number-multiply :number-add :k2 :number-multiply :x1  :x1 :x2 :k1 :number-multiply :k0 :number-add :number-multiply]


When I run permuted f I’ll end up with three numbers on the :number stack: [(* x1 (+ (* x2 k1) k0)) x1 (* k2 x1)]. Because of failed (skipped) instructions and a weird rearrangement of successful ones, and assuming I only count the top item on the :number stack, then I’ve returned only a weird mutated chunk of the original function $$f(x_1,x_2)$$. I can’t really even highlight a stretch of the original function in the new one.

$f(x_1,x_2) = k_2 x_1^2 + k_1 x_1 x_2 + k_0$ $f'(x_1,x_2) = k_1 x_2 + x_1$

## So?

So. The question in my mind today (and one that Ron asked as well) is: What is the relation between a permuted Push program and its original semantic meaning?

Recall that I started because I had been reading work on Geometric Semantic Crossover, this affine combination of two (numerical) programs. And that literature was in response to the years-long habit from Koza’s original work of doing crossover between tree-based functions by subtree crossover.

What then is this permutation thing?

Imagine you had two programs. As in other cases, imagine also that these are interesting, potential solutions that maybe aren’t quite as good as you would like. If they’re represented as S-expression trees, you can play around with them by discrete subtree exchange, or by numerical semantic affine combination, looking for better things “between” them.

But in Push, you can jumble them up together, and every permutation is runnable. It may be that some don’t represent answers any more (we didn’t get an empty :number stack, but in a full Push implementation we certainly could have). But they’re inevitably syntactically correct.

This is the crucial thing: every Push program can be run. Not just programs involving arithmetic, but recursive programs, programs with :string and :matrix and :shape types and instruction. Push programs that manipulate Push code. Any Push program.

That seems important. I think maybe we should look at it a bit more closely, next time.

1. He said, pointedly….

2. Actually this behavior will definitely vary from implementation to implementation. The effect will be unchanged for most instructions, as long as we stick stuff back on the :exec stack. In many implementations, the results go to “the obvious eventual destination”, so I’m hiding some fiddly details here.