(Almost) evolving Artificial Chemistries

Draft of 2016.07.29

May include: GPAIlearning in public&c.

Continued from “Running Register Machines”

Today I managed to subvert my weekly phone call with Nic McPhee into a pair-programming session, and we split the time equally between this RegisterMachine process and some work he is doing with SMAC. I’ll talk about the former here—not least because the latter was kind of sort of just trying to make it work and stuff but not succeeding.

First thing I did was talk him through the earlier episodes, and what a RegisterMachine is supposed to do. First thing he does is point out that this is surprisingly similar to a language developed by a faculty member at his graduate school. The language is called UNITY, and like these RegisterMachine things it has the characteristic of permitting and encouraging random order of execution of steps. Not only have I never heard of UNITY, there are no references to it in either Wolfgang’s paper or Wolfgang’s book on Artificial Chemistries! This is surprising, and Nic makes a note to send Wolfgang an email about it. I’ll have to look more closely, but I think there may be some interesting conceptual overlaps there, and it might benefit both groups. Assuming anybody is using UNITY any more.

Another thing Nic asks, as I explain the various functions I’ve written so far, is whether I might consider making the rm-or and rm-and functions work something like the and and or functions in Ruby, which short-circuit with pass-through (or words to that effect). That is, in Ruby, y = 1 or 2 will set y to 1, and y = 1 and 2 will set y to 2. That’s OK, but it’s not the same logical operator behavior described in the paper, so I’ll add that to the “maybe as soon as we see it needs help” stack of additional instructions to add when search fails to be interesting enough.

On that same subject, he and I also agree that probably there should be some kind of “transfer” function, something that just takes one register’s value and sticks it into another register, so it can propagate around and maybe evolution can develop some kind of long-term storage. After all, we’re thinking about putting the inputs into the :connectors registers, and they almost certainly can be overwritten unless they end up being stored in some other redundant register. Put that on the “maybe as soon as we see it needs help” stack also.

Finally (Nic is at least as good at riffing out the old ideas as I am) he wonders if maybe something to re-load the inputs would be important. Some kind of instruction that drags the original values for a register (set at initialization) back into play. Again, I see it, but in this case doesn’t it sort of imply—at least in the object model described in the original work—that the inputs are therefore “stored” in some read-only registers, and that there are instructions for dragging them from read-only to the place we put them in the :connectors registers? Probably. Same list.

Cleanup on aisle Optimism

As we’re reviewing the troops, an amusing coincidence happens.

Remember how I wrote this test so I could check whether invoke-any-step would actually change something in the :connectors array? And I said that I ran it a whole bunch of times and never saw it fail because of a random chance non-changing arithmetic operation?

(fact "I can invoke a random program step of a RegisterMachine, and it will make a change"
  (let [rm (->RegisterMachine
              (into [] (take 5 (repeatedly #(rand 100.0))))
              (into [] (take 5 (repeatedly #(rand-int 100))))
              (random-program all-functions 5 5 20))]
    (count (clojure.set/intersection 
      (into #{} (:connectors (invoke-any-step rm)))
      (into #{} (:connectors rm)))) => 4

Well, this was when it failed.

Nic admonished me to just tear the damned thing out, and so I did. It’s good to have these tests in place while I’m developing code in my customary dark cloud of uncertainty, but they don’t need to stick around after other things work that use the code I’m uncertain about. I take his advice immediately, and it’s a goner.

And now to work

We decide to pursue evaluation before we pursue search. That is, we want to be able to set the inputs of a RegisterMachine (whatever that means), and examine its output (whatever that means), and assign an error score when comparing the observed output to the desired one for a given training case. That’s how fitness scores go, after all.

Sadly, there is absolutely no guidance in the paper from which I’m working. In fact, I look over some of the other papers in Wolfgang’s bibliography and don’t find anything more informative there either. So Nic and I chat a bit about it, and wonder whether we should make some kind of special output register, or pick one of them by convention: the first, the last, the middle one? Since my implicit guide here is the book chapter, and there is no mention whatsoever of “special output registers” there, I opt for the second one. We’ll say, WLOGWH (without loss of generality we hope), that the last register of the :connectors vector is the one specially marked “output” in every RegisterMachine everywhere forever.

That immediately suggests a simple function to ask a RegisterMachine to provide an “output”:

(defn output
  "Returns the current value of the last `:connectors` element of an RegisterMachine"
  (last (:connectors rm)))

The next thing to decide—again because Wolfgang and Christian didn’t actually seem to write it down here—is what to do with inputs. Recall that a RegisterMachine has two kinds of registers: :read-only (11) and :connectors (30). It’s tempting to think the inputs should be stored in :read-only, so they can be preserved for the duration of running a program… but the paper specifically says that :read-only is for evolved constants, not inputs. So we’ll put the input values into :connectors.

Then the same problem about where. The two example problems have very different argument signatures: symbolic regression of \(y=sin(x)\) has one input, and the Thyroid Disease classification problem has 6 real and 15 binary inputs. It doesn’t feel right to put them at the end of :connectors, so we decide (again WLOGWH) to stick them at the front of the vector.

This raises the question of what exactly we’re asking evolution to do for us, which is a question everybody doing genetic programming should have in mind. Yes, we say on the surface we want it to “learn from the data”. But we can also say that we want it to “learn to connect the inputs to the outputs”, and in the case of a RegisterMachine with random ProgramStep items, these are not necessarily connected portions of a graph; there are 100 ProgramStep items, but since they’re random and directed, the resulting hypergraph may not even be able to write to the last :connectors register!

But evolutionary search is powerful, and the paper says it works. We’ll see. It’s something to watch out for, is all.

So we need a function to assign new values to the “input part” of the :connectors vector in a RegisterMachine:

(defn set-inputs
  "Takes a RegisterMachine and a vector of input values. Replaces the first portion of the `:connectors` vector with these values."
  [rm inputs]
  (let [i (count inputs)]
    (assoc rm 
           (into [] (concat inputs (drop i (:connectors rm)))))))

Run that program

Now we have inputs set and output watchable. Let’s run programs! Wait. How do we do that?

(defn output-given-inputs
  "Takes a RegisterMachine, a count of steps to run it, and a vector of inputs. Overwrites the `:connectors` vector of the RegisterMachine with the inputs (starting at the front end) and executes the program steps at random. Returns the value of the last element of `:connectors` at the last step."
  [rm steps inputs]
    (invoke-many-steps (set-inputs rm inputs) steps)))

That’s how! How do we know?

(fact "output-given-inputs steps a bunch and returns a number"
  (let [rm (->RegisterMachine [1] [1] [(->ProgramStep +' [0 1] 0)])]
    (output-given-inputs rm 5 [99]) => 104
    (output-given-inputs rm 1 [99]) => 100
    (output-given-inputs rm 0 [99]) => 99

So this RegisterMachine is a nice simple testing example, which we can call “counter”. It has one constant in :read-only (the value 1), one value in :connectors (starting with 1, but this will be overwritten by the input), and a single ProgramStep that adds the constant to the item in the first position in :connectors, and saves the result in the same location. In other words, it increments the value in :connectors on every step.

In the tests we are also exercising the set-inputs function. That vector [99] is our one input value. It overwrites the original value of 1 in :connectors, and then we execute some steps of the ProgramStep. As a result, we start counting at 99, adding one for each step. And finally we return the output, which is the “last” element of :connectors by convention. Which in this case is also the first element, and so we get back the new sum.

Up and down

Feeling a bit cocky, we decide to try evaluating a RegisterMachine for an entire suite of training cases. Just to give ourselves a fighting chance, we pick the \(y=sin(x)\) example, and generate a bit of training data to work with:

(defn random-sine-case
  (let [x (- (* 2 (rand Math/PI)) Math/PI)]
    [[x] (Math/sin x)]

(def sine-data
  (repeatedly 100 random-sine-case))

In other words, we’re picking a hundred random x values in the range \([-\pi, \pi]\), and associating those with the desired output values, \(sin(x)\). The data structure we hit on is a bit odd; it’ll look like this:

( [[x1] y1] [[x2] y2] [[x3] y3] ...)

That is, it’s a collection of tuples of inputs and output values, with the inputs represented as a Clojure vector. That way (we think) we can use it more easily to overwrite the vector :connectors when we initialize a test. As it happens, that’s not entirely the case, and we end up doing some of that awful Clojure (into [] ...) collection-casting that anybody who ever programs Clojure ends up hating but doing as a reflex. But it works.

From there, we can easily “map” the RegisterMachine over the entire set of training data, producing a collection of results:

(defn output-vector
  "Takes a RegisterMachine, a number of steps, and a training data set. Returns a collection of output values, one for each of the training cases."
  [rm steps data]
  (map #(output-given-inputs rm steps (first %)) data) )

And then it’s just a hop and jump to a function that returns a collection of absolute errors, for a given RegisterMachine:

(defn error-vector
  "Takes a RegisterMachine, a number of steps, and a training data set. Returns a collection of absolute errors, comparing the output observed for each training case to its expected value."
  [rm steps data]
  (let [outs (output-vector rm steps data)]
    (map #(Math/abs (- %1 %2)) outs (map second data))

We have tests for all these, of course, but I suspect they’re getting boring by now. So I jump the gun a bit (Nic is scandalized a little, I think) and run one of my test-to-fail exploratory tests, just to see the danged thing work:

(fact "I can apply error-vector to sine-data"
  (let [rm (->RegisterMachine [1] [1] [(->ProgramStep +' [0 1] 0)])]
    (count (error-vector rm 100 sine-data)) => (count sine-data)
    (error-vector rm 100 sine-data) => 88

That last line is the kicker, and the error is very informative for me. Recall that rm (the fixture RegisterMachine I’ve defined here) just counts up by one for every step you run it. We’re running it 100 steps in this call, starting from a value of x somewhere in the range \([-\pi, \pi]\), so it counts up to somewhere in the range \([100 - \pi, 100 + \pi]\). And then we ask, “How different is that number from \(sin(x)\)?” As you can see, the results are all pretty much what you’d expect, even for the randomly-chosen x values we’re using here.

FAIL "I can apply error-vector to sine-data" at (core_test.clj:202)
    Expected: 88
      Actual: (99.23408917820309 98.25406646827487 100.05571888311373 97.93550139099193 102.62832563665788 99.67986574794232 100.25892881507762 100.14463446623468 98.70654907679427 99.98665666185276 100.07767756224916 99.94452150524074 100.36230103431896 100.1231338542296 98.10599714361675 99.88466653600183 99.99999577841787 99.99653499049145 100.99591256927283 96.98569275903385 99.99966715967463 99.99650426680941 97.07622660916579 103.04685358191604 99.79771505651341 99.98620902187191 101.82994009969791 100.0670301554521 99.88103268624539 99.600648359457 99.21239565214243 99.98084520520356 99.20715846864302 102.16857887384546 100.97798261279988 97.42302104355461 99.10455877986703 99.99999319281733 100.49902575044403 99.9995797207688 101.84608938283766 101.03142547809554 99.99872532259744 99.82251265656909 100.51442532513747 102.37299822379165 97.41631716674983 99.95497419078578 99.80277871037538 99.29210235298426 101.2809920570527 98.68338321138923 98.76729736890255 102.1643209071571 101.00563452984461 99.9962221058587 99.9336270735547 100.02599058329432 100.17482858331239 100.01936145634934 98.51444800888513 99.96155907245452 99.77868385706378 99.98517832060051 99.88059519794894 98.67301693074853 100.14512655278769 99.9705086951184 98.06204690354345 99.94734084233667 101.95726828794932 100.41041324414142 98.48318239819866 101.64087612806031 97.07155383543768 100.1472108760928 98.97430099227186 101.5727579677903 101.66698816770162 99.66519090727164 101.05618326846655 99.53612050107516 100.02170643239694 100.60502322543734 99.86057938246861 100.76615974724574 102.49361802182013 100.49207844459181 101.95406664372683 100.3673478234284 102.52202111625016 100.14705189766507 99.61905172931476 101.93941696301023 97.64316702066941 98.41752703522525 100.02937687690883 99.98495647227575 99.96189718979745 101.98339069960424)
FAILURE: 1 check failed.  (But 52 succeeded.)

Yes, sure, I could have tested that the error minus the target was exactly 100. But honestly, how fun would that be?

Next time: No really, I am actually evolving these damned things, dammit