Draft

Making a Register Machine

Draft of 2016.07.27

Continued from “A quick Artificial Chemistry in Clojure”

I’m going to be working in Clojure. There’s literally no reason I needed to do that. This system is so simple (seriously, not exaggerating) that you could do the same work in Javascript, Haskell, Ruby, Python, whatever you want. Indeed, I think you probably ought to. It’s a quick one-day project, if you’re interested.

Let me know how it turns out!

Starting steps

I think I should probably start with a simple repository to work in. I set one up… and immediately mess it up, because I forgot the damned .gitignore and so it’s already littered up with .DS_Store and .lein-repl-history files before I’ve done any real work. Dagnabbit!

But cleanup isn’t that hard (I hope), and it just leaves a little detritus in the repository’s history. There’s the stuff the Mac OS adds (.DS_Store for example), there’s random histories and partially-compiled files (Clojure /target directory, for example) and so on. I copy-and-paste from a big project I find sitting around, and we’ll hope I got all the pieces.

Since I wrote that list of features last time I’ve been thinking about how to start off. Today I want to build some kind of “register machine” that can contain arbitrary code and registers, and do the random “running” thing. Then I can build some kind of evolutionary search thing on top of that.

So what is a “register machine” again? It’s fundamentally a collection of “program steps”, each of which reads some registers, does some math on those values, and writes a “result” to one register. And of course it’s a collection of the registers themselves. In all the work I’ve seen on these, the authors are pretty formal about naming these registers, but to be honest they could as easily work positionally. But also some of the registers are “read only”, and some are “read/write”.

The simplest way I can think of to approach this is to use an ordered collection, and since I’m in Clojure I expect it ought to be a vector.

I think I’ll write a test first. I want a thing called a RegisterMachine. Make it a record, give it some register values and code. I think I ought to have :read-only and :connectors as the two types of registers described in the paper, and a collection of steps as a :program. Here’s a failing test that expresses those intentions:

(fact "I can make a new RegisterMachine"
  (:read-only (->RegisterMachine [1 2 3] [4 5 6] [:foo])) => [1 2 3]
  (:connectors (->RegisterMachine [1 2 3] [4 5 6] [:foo])) => [4 5 6]
  (:program (->RegisterMachine [1 2 3] [4 5 6] [:foo])) => [:foo]
  )

I don’t have a definition yet for ->RegisterMachine, so let me make one to get these to pass. It’s pretty easy, actually (and if you know Clojure, you know why I say that, because of the ->X form I used). All I need to do is define a new record with the requisite components:

(defrecord RegisterMachine [read-only connectors program])

I’m thinking the three vector arguments will be literal values when I build one of these things. So the first one will be the literal values assigned to the :read-only registers, the second one will be the literal values assigned to what they call “connection registers” in the paper, and the third one will be a pile of program steps.

I guess I want to be able to read and write individual registers more easily next. Certainly I’ll want to be able to do that in defining the program steps, since that’s pretty much how those work.

This feels like as good a start as any:

(fact "I can read `:read-only` and `:connectors` registers"
  (let [testing (->RegisterMachine [1 2 3] [4 5 6] [:foo])]
    (value-of testing :read-only 1) => 2
    (value-of testing :connectors 1) => 5
  ))

I want a function called value-of that reaches down into the record and returns the current value. Which is more or less a wrapper around get-in, just to clarify the syntax a bit. Did I even need to write this? I don’t know:

(defn value-of
  "Takes a RegisterMachine record, field 
  (`:read-only` or `:connectors`) and index. 
  Returns the value stored there."
  [rm field idx]
  (get-in rm [field idx]))

There are a lot of validations I feel I might want some day, but you know what? I can probably deal with those when they become a problem.

Probably I want to be able to write to the :connectors, too. Same deal:

(fact "I can write to `:connectors` registers"
  (let [testing (->RegisterMachine [1 2 3] [4 5 6] [:foo])]
    (:connectors (write-value testing 1 99.99)) => [4 99.99 6]
  ))

which fails (because I haven’t added this code yet):

(defn write-value
  "Takes a RegisterMachine record, index, and new value.
  Updates the value stored in the `:connectors` vector at
  that index."
  [rm idx number]
  (assoc-in rm [:connectors idx] number))

“Programs”

Now the sense of overconfidence is going to dissipate, because it’s time to write some :program stuff. What will the steps be? Based on the original paper, I’ll want all these sorts of things:

  • add, two arguments, numerical
  • sub, two arguments, numerical
  • div, two arguments, numerical; also, hey what about division by zero?
  • mult, two arguments, numerical
  • pow, two arguments, numerical; also also, what about even roots of negative numbers and stuff?
  • and, two arguments, “logical”… but can I use 1.0 and 0.0 to represent true and false? If I do, how should I treat non-binary numerical inputs? Something like Clojure does, where anything is true?
  • or as above
  • not, one argument

Well, that starts off “obvious”, but then it gets interesting pretty quickly. As is ever the case.

What I want here is some kind of generic data structure that will hold one of these functions, the indices of its argument or arguments, and the index to which it should write the result. Sound at the moment like another record type, maybe a ProgamStep or something?

(fact "I can make a new ProgramStep"
  (:args     (->ProgramStep +' [9 12] 3)) => [9 12]
  (:function (->ProgramStep +' [9 12] 3)) => '+'
  (:target   (->ProgramStep +' [9 12] 3)) => 3
  )

Which is satisfied with another glib record definition like this:

(defrecord ProgramStep [function args target])

(If you’re unfamiliar with Clojure syntax, you might be wondering what +' means in that code, or '+' for that matter. Clojure has two variant “addition” functions: one called + which will overflow if you give it two numbers whose sums exceed the limit of their own type (so for example two huge long values added together will overflow), and another called +' which is an auto-promoting arithmetic function. In the latter, when you add two numbers that would overflow their own type’s bounds, they’re invisibly converted to the next-largest type. In the case of integer values that will be a BigInteger arbitrary-precision type.

Oh, and the preceding quote mark sitting in front of any Clojure symbol like 'foo, means you’re referring to the symbol. So you can type Clojure code like (:function (->ProgramStep +' [9 12] 3)) => '+', which in plain English means something like, “When I construct a ProgramStep with a first argument of the auto-promoting addition function, and look at the :function component of the record that is produced, the auto-promoting addition function will be the value associated with the key :function. Make sense?)

I suppose what I’m planning here (though I swear I’m really not planning, just letting it unfold as I go), is that there will be a series of invoke or apply or something, taking a ProgramStep and a RegisterMachine, and returning the new RegisterMachine you get by applying that ProgramStep.

I like invoke, and besides apply is already taken in Clojure. So:

(fact "I can invoke a ProgramStep 'on' a RegisterMachine"
  (let [ps (->ProgramStep '+ [5 1] 0)
        rm (->RegisterMachine [1 2 3] [4 5 6] [:foo])]
  (invoke ps rm) => (->RegisterMachine [1 2 3] [8 5 6] [:foo])
  ))

There’s an interesting thing I discover writing that: I’ve divided the :read-only and :connectors indices into two collections, but I want functions to be able to read arguments from either batch. I consider for a moment whether there should be another argument of some sort that says “what kind” of index I’m asking for… but that just feels like I’ll pay for it later. There’s no difference between the two piles from the “reading side” (just the “writing side”), so I decide instead to treat the indices as if the two vectors were concatenated before indexing.

That changes the way my value-of function works. I make changes to my tests and code:

(fact "I can read `:read-only` and `:connectors` registers"
  (let [testing (->RegisterMachine [1 2 3] [4 5 6] [:foo])]
    (value-of testing 0) => 1
    (value-of testing 3) => 4
    (value-of testing 5) => 6
  ))

which now works this way:

(defn value-of
  "Takes a RegisterMachine record and index. The index refers
  to the index in the _concatenated_ `:read-only` and `connectors`
  vectors. Returns the value stored there."
  [rm idx]
  (nth (concat (:read-only rm) (:connectors rm)) idx))

I think I can get away with that. I’m half-tempted to make all the registers into one vector and somehow specify which are read-only in another way. But… nah. Something better will probably present itself. I’ll optimize when it smells too bad.

Now for invoke. I’ll spare the back-and-forth between the test(s) and source code for this one. It essentially says “get the indexed argument values (referring to the concatenated vector of all registers), and do the math to those, and stick the result into the indexed position in the :connectors vector”.

(fact "I can invoke a ProgramStep 'on' a RegisterMachine"
  (let [rm (->RegisterMachine [9 8 7] [4 5 6] [:foo])]
    (invoke (->ProgramStep + [5 1] 0) rm) =>
      (->RegisterMachine [9 8 7] [14 5 6] [:foo])

    (invoke (->ProgramStep - [2 1] 0) rm) =>
      (->RegisterMachine [9 8 7] [-1 5 6] [:foo])  

    (invoke (->ProgramStep * [0 3] 0) rm) =>
      (->RegisterMachine [9 8 7] [36 5 6] [:foo])

    (invoke (->ProgramStep / [4 3] 0) rm) =>
      (->RegisterMachine [9 8 7] [5/4 5 6] [:foo])
  ))

The only trick in doing this was an incidental fluke of Clojure syntax.

(defn invoke
  [ps machine]
  (let [indices (into [] (concat (:read-only machine) (:connectors machine)))
        values  (into [] (map indices (:args ps)))
        result  (apply (:function ps) values)]
    (assoc-in machine [:connectors (:target ps)] result)
    ))

I knew already about the idiom (map [9 8 7 6 5] [0 1]), which returns a collection of the first two entries of the vector [9 8 7 6 5] (in other words, '(9 8)). But it turns out that you can’t do that when the first argument is the result of a concatenation. That is, this won’t work, and throws an exception:

user=> (map (concat [9 8 7 6 5] [77 77]) [0 1])
ClassCastException clojure.lang.LazySeq cannot be cast to clojure.lang.IFn  clojure.core/map/fn--4785 (core.clj:2644)

And that’s why there are those two (into []...) wrappers in this function.

Edge cases

I have a feeling I’ll want to deal with some edge cases now, just to avoid worrying about them later on. For instance, as I said above, there’s the question of dividing by zero.

Amusingly, this isn’t mentioned at all in the original paper. Since it’s 2004, and since there’s nothing about handling exceptions when running these register machines, I’m going to guess that they used some sort of protected division and didn’t mention it. Traditionally1, what happens in Genetic Programming settings where division by arbitrary values can occur, attempting to divide by 0.0 produces a result of 1.0. I suppose I’ll do that here.

(fact "protected division returns 1.0 instead of blowing up"
  (let [rm (->RegisterMachine [0 0 0] [0 0 0] [:foo])]
    (invoke (->ProgramStep pdiv [4 3] 0) rm) =>
      (->RegisterMachine [0 0 0] [1.0 0 0] [:foo])
  ))

In that test I’m dividing , and the result is 1.0 instead of an exception. I notice that I should go back and change my previous tests, the ones that used / instead of pdiv, so:

(fact "I can invoke a ProgramStep 'on' a RegisterMachine"
  (let [rm (->RegisterMachine [9 8 7] [4 5 6] [:foo])]
    (invoke (->ProgramStep + [5 1] 0) rm) =>
      (->RegisterMachine [9 8 7] [14 5 6] [:foo])

    (invoke (->ProgramStep - [2 1] 0) rm) =>
      (->RegisterMachine [9 8 7] [-1 5 6] [:foo])

    (invoke (->ProgramStep * [0 3] 1) rm) =>
      (->RegisterMachine [9 8 7] [4 36 6] [:foo])

    (invoke (->ProgramStep pdiv [4 3] 0) rm) =>
      (->RegisterMachine [9 8 7] [5/4 5 6] [:foo])
  ))

While I was writing those tests, I guess I sort of “accidentally” implemented add, sub, mult and div. They’re free, more or less, since they’re core Clojure functions. But pow is a little different, because I’ll need to use clojure.math.numeric-tower for that.

Because I’m using lein to run things, I modify my project.clj file to include the new dependency:

(defproject artificial-chemistry "0.0.1-SNAPSHOT"
  :description "Artificial Chemistries, based on Banzhaf & Lasarczyk, 2004"
  :dependencies [[org.clojure/clojure "1.8.0"
                 [org.clojure/math.numeric-tower "0.0.4"]]]
  :profiles {:dev {:dependencies [[midje "1.8.3"]]}})

I bring this up because (from experience) I know there are other edge cases when one starts raising numbers to other numbers willy-nilly. For example:

(fact "exponentiation doesn't blow up"
  (let [rm (->RegisterMachine [-2 1/4] [0] [:foo])]
    (Double/isNaN (first (:connectors 
        (invoke (->ProgramStep pow [0 1] 0) rm)))) =>
      true
))

Interestingly, when I take the square root of -2 using clojure.math.numeric-tower/expt, I get NaN as a result, and not an exception like I expected. That’s an interesting outcome, and it makes me wonder what I should do.

Should I just let NaN propagate through our system when this happens? The value is permitted to be an argument in almost all Clojure functions I’ve tested through the years, and it’s contagious. So for example (+ 9 NaN) gives a result of NaN. Or should I “protect” exponentiation in the same sort of way I protected division? Or should I unprotect division, and make it return NaN values when it attempts to divide by zero?

Decisions, decisions.

I decide to take the easiest way out, and just let those NaN values propagate. If something happens downstream, I’ll find out soon enough. (Do you hear foreshadowing music?)

That leaves pow as a simple wrapper around clojure.math.numeric-tower/expt:

(defn pow
  "Exponentiation"
  [base exponent]
  (math/expt base exponent))

But I sense I’ll be revisiting that shortly.

Not logical

That leaves the nominally “logical” functions to go. What then should I do to implement and, or and not, here?

I have a strong suspicion that the authors really mean their “logical” operations are still numerical in nature. I’ll bet money they model true as 1.0 and false as 0.0, in classic Computer Science style.

I can do that too, in other words. Plus Clojure itself tends to use “anything” to represent true values, and “nothing” (empty collections, nil sometimes) for false. So by fiat I can say something like “any non-zero numerical value, positive or negative, is interpreted as true, and zero is interpreted as false”. At least to get started.

Let’s start with our internal version of not:

(fact "rm-not treats a numeric argument as if it were a boolean"
  (rm-not 9) => 0.0
  (rm-not 0.0) => 1.0
  (rm-not 1.0) => 0.0
  (rm-not -9/13) => 0.0
  )

Which is simple enough to implement:

(defn rm-not
  "logical not on a numerical value"
  [n]
  (if (zero? n) 1.0 0.0))

How about and?

(defn rm-and
  "logical `and` on numerical values"
  [n1 n2]
  (let [n1z (not (zero? n1))
        n2z (not (zero? n2))]
  (if (and n1z n2z) 1.0 0.0)
  ))

(fact "rm-and treats its arguments as if they were booleans"
  (rm-and 9 8) => 1.0
  (rm-and 9 0) => 0.0
  (rm-and 0 0) => 0.0
  (rm-and 0 1) => 0.0
  (rm-and -3 -2) => 1.0
  )

And or:

(defn rm-or
  "logical `or` on numerical values"
  [n1 n2]
  (let [n1z (not (zero? n1))
        n2z (not (zero? n2))]
  (if (or n1z n2z) 1.0 0.0)
  ))

(fact "rm-or treats its arguments as if they were booleans"
  (rm-or 9 8) => 1.0
  (rm-or 9 0) => 1.0
  (rm-or 0 0) => 0.0
  (rm-or 0 1) => 1.0
  (rm-or -3 -2) => 1.0
  )

Putting it through its paces

Surprisingly, I think I’m done. More or less. Let’s see what I can do now, and what else is needed.

I want to be able to make programs, not just individual program steps. How about a nice random program step first, though?

(fact "random-program-step"
  (random-program-step all-functions 10 12) =>
    (->ProgramStep :FOO [17 17] 7)
  (provided (rand-nth all-functions) => :FOO,
            (rand-int 22) => 17
            (rand-int 12) => 7))

Here I’m using Midje’s excellent providing clause to stub out the random number generation I’m writing. Instead of returning pseudorandom values, the stubbed functions (when called with those arguments) produce the specified results. Thus, when I call this code, which makes a random ProgramStep, I get the specified results from each random call:

(defn random-program-step
  [functions readonly connectors]
  (let [readable (+ readonly connectors)]
  (->ProgramStep
    (rand-nth all-functions)
    (vector (rand-int readable) (rand-int readable))
    (rand-int connectors)
    )
  ))

Which makes this function testable!

The problem of not

There’s one thing worrying me: Of all the functions I’ve defined, only rm-not is arity-1. All the rest take two arguments. When I tested (apply not [1 2]) in the REPL, it raised an exception. So I should probably work out a way to ensure there is only one argument when the random-program-step is an arity-1 function.

That said, I don’t want to just kludge an if statement that checks whether the function being matched is specifically rm-not. That’s discomforting. So there’s some kind of abstraction hidden in here that wants to “know” about the arity of the functions it’s building.

I can see a simple way around it for now, and it adds just enough flexibility that I might want to keep it over time (assuming I keep playing with this system). I change all-functions to be a hash:

(def all-functions
  {+' 2,
   *' 2,
   -' 2,
   pdiv 2,
   pow 2,
   rm-and 2,
   rm-or 2,
   rm-not 1})

Where the value associated with each function is the arity of that function. It’ll do for now, though it entails a little rewrite of random-program-step:

(defn random-program-step
  [functions readonly connectors]
  (let [readable (+ readonly connectors)
       [which-fxn arity] (rand-nth all-functions)]
  (->ProgramStep
    which-fxn
    (into [] (take arity (repeatedly #(rand-int readable))))
    (rand-int connectors)
    )
  ))

Not quite as readable as it was, but it seems to work OK with rm-not now.

(fact "random-program-step with arity 1"
  (random-program-step all-functions 10 12) =>
    (->ProgramStep rm-not [17] 7)
  (provided (rand-nth all-functions) => [rm-not 1],
            (rand-int 22) => 17
            (rand-int 12) => 7))

But that’s as far as I can get today. The gas meter is being replaced now. Hope we don’t blow up!

I have a strong feeling, suspicion, intuition, all of those that I’ve made a bad mistake somewhere along the line. Like I missed an important test, or failed to see an obvious flaw. The one thing I know is that generating and running a whole pile of random programs will certainly be a good way to discover whatever it is I’ve missed.

Next time: I didn’t blow up! (but this does)

  1. But not in the Klapaucius interpreter, where division by zero actually does produce an :error result instead of a numeric value.