String chemistries: poking around

Draft of 2016.12.04

May include: puzzlegamerecreation&c.

Yesterday I described a simple sort of “string chemistry”, where a Virtual Chemostat contains string objects that bump into one another randomly, and which may rewrite one another if they contain the patterns defined in a simple “Chemistry Table”.

Here’s the Chemistry Table I handed suggested for exploration:

catalyst    substrate    product
--------    ---------    -------
abbba       bba          abab
baba        baab         bababb
aabb        babb         ba/bb
babab       aaaa         bbaab
bbbb        a            aa
aaa         abb          a/bb

And I asked what might happen if you started with exactly one copy of each 5-character string on the alphabet {a,b}.

I’m curious, too. So let’s find out.

In which we find out

I’m tempted to explore with ClojureScript and Quil, since that’s my recent focus with the lattice protein stuff I’ve been doing recently. But to be frank, it’s been ages since I’ve worked in Ruby, and I’m not planning on visualizing much yet. This is really just an exploration.

In fact, I’m going to be super stupid, and hack something together in the irb REPL, not even testing for now. Think of it as a design spike, because—to be honest—I want to make sure I gave you the “right” Chemistry Table.

ChemRule = Struct.new(:catalyst, :substrate, :product)

r1 = ChemRule.new("abbba", "bba", "abab")
r2 = ChemRule.new("baba", "baab", "bababb")
r3 = ChemRule.new("aabb", "babb", "ba\nbb")
r4 = ChemRule.new("babab", "aaaa", "bbaab")
r5 = ChemRule.new("bbbb", "a", "aa")
r6 = ChemRule.new("aaa", "abb", "a\nbb")

def apply_this_rule(rule,actor,target)
  if (actor.match(rule.catalyst) && target.match(rule.substrate))
    intermediate = target.sub(rule.substrate,rule.product)

puts apply_this_rule(r1,"abbba", "bbabba").inspect
# ["ababbba"]
puts apply_this_rule(r3,"aabb", "xxxbabbxxx").inspect
# ["xxxba", "bbxxx"]
puts apply_this_rule(r3,"aabb", "xxxx").inspect
# ["xxxx"]

I started here with a little definition of a ChemRule struct and an admittedly Clojure-ish imperative functional approach. What can I say; it’s a design spike, and I’ll be throwing it away when I’m done exploring.

The interesting thing about the two kinds of rules I defined for you in the problem description is the distinction between substitution rules and cutting rules.

I spent a little time when I was working on apply_this_rule(), thinking I’d make two kinds of ChemRule “classes”. But then I realized there’s a simpler way. So the cutting rules are defined just like the regular substitution rules, but with a "\n" newline character where the “cut” is intended to happen. When a rule is applied successfully, it creates a string called intermediate, and then that is immediately split (on newlines) into an array containing the two parts.

For consistency, the “substrate” string is returned in an Array when nothing is changed. I figure I’ll deal with the results more simply if they’re not too polymorphic at this stage, and an array of strings seems minimally sufficient.

The three “puts tests” spit out confirmation enough for me to check that the results are what’s expected.

So you’ll notice that this is just a function that applies a single specified rule to a single “catalyst” string and a single “substrate”. It seemed simpler to assume that I’d be checking which rule to apply in a wrapping conditional function of some sort. One quirk of the problem description I gave you is that there can only ever be a maximum of one rule applied to a string by another, no matter how many catalyst patterns and/or substrate patterns exist.

def react(ruleset,actor,target)
  feasible_rules = ruleset.keep_if do |r|
    actor.match(r.catalyst) && target.match(r.substrate)

  if (feasible_rules.length > 1)
    front_hit = feasible_rules.
      collect {|r| target.match(r.substrate).begin(0)}.min
    earliest_rules = feasible_rules.
      keep_if {|r| target.match(r.substrate).begin(0) == front_hit}
    active_rule = earliest_rules.first
    active_rule = feasible_rules.first

  if active_rule.nil?
     result = [target]
     result = apply_this_rule(active_rule,actor,target)
  return result

puts react([r1,r2,r3,r4,r5,r6],"babab","baabaaaa").inspect
# ["bababbaaaa"]

puts react([ChemRule.new("a", "b", "c"),
            ChemRule.new("a", "b", "X")],"bab","bab").inspect
# ["cab"]

puts react([ChemRule.new("a", "b", "X"),
            ChemRule.new("a", "b", "c")],"bab","bab").inspect
# ["Xab"]

This larger react() function was a bit more complicated. It’s been long enough since I’ve worked in Ruby that I totally forgot about the idiom for working with splat arguments, so things got a bit hairy when I was debugging. But eventually I worked it out, and I’ll spare you all the intermediate head-scratching.

The function is relatively straightforward in structure. The first argument is assumed to be an Array of ChemRule instances, and the second and third arguments are the “catalyst” and “substrate” strings again. We start by filtering out all the rules that can’t apply, by removing those for which the catalyst pattern isn’t found in the acting string and the substrate pattern isn’t found in the target string. In other words: you gotta have the patterns.

Then, if there are multiple “surviving” rules remaining, we need to work out which one will be applied. When I specified the problem, I confess I may have misremembered the rules I originally implemented in C back in 1994… but we can manage, right? The way we’re working here, I said that rule(s) with the substrate pattern that appears first (in left-to-right order) in the target string will be used. Because we’ve got the rules in an arbitrary order still, I first record the earliest position of any substrate pattern, and then remove all remaining rules where the defined substrate is not in that position.

You might wonder why it’s so fiddly. Well, if you look at the Chemistry Table I gave you, you’ll see there are several places where substrate patterns overlap. The most concerning are the substrates of rules 5 & 6: aaaa and a, respectively. If the catalyst has the option of applying either, and the target string is for example aaaaaaaaaa, then both rules’ substrate patterns start at position 0.

Anyway, there’s a conditional in the final function I’m not proud of, because I’d forgotten the case where no reaction is possible. Because I’m starting with the Array of all the rules, filtering away ones that don’t apply, and then taking some_rules.first in both branches, there’s a possibility that the result of that is nil. And because I’m lazy and didn’t want to refactor the nil guard up into apply_this_rule(), I have a stupid if block there.

Hey: design spike.

So what does it do?

ab_strings = ['a','b'].repeated_permutation(5).to_a.collect{|p| p.join}

same = 0
diff = 0

ab_strings.combination(2).each do |combo|
  x = combo[0]
  y = combo[1]
  products = react([r1,r2,r3,r4,r5,r6], x, y) + 
             react([r1,r2,r3,r4,r5,r6], y, x)
  [x,y].sort==products.sort ? same += 1 : diff += 1
  puts "#{x} + #{y} => #{products} #{[x,y].sort==products.sort ? :same : :diff}"

puts "same : #{same}, diff: #{diff}"

This little exerciser makes every possible 5-symbol string, and bumps each one against every other one to see what happens, with a bit of counting logic in there to tally up how many possible interactions actually change the interacting strings at all. It’s worth pointing out that what I’m collecting here is both interactions. Recall that the problem description specifies that if both strings contain a catalyst pattern that acts on some substrate pattern in the other, then both are “simultaneously” changed.

When I run it, I get something like:

aaaaa + aaaab => ["aaaab", "aaaaa"] same
aaaaa + aaaba => ["aaaba", "aaaaa"] same
aaaaa + aaabb => ["aaa", "bb", "aaaaa"] diff
aaaaa + aabaa => ["aabaa", "aaaaa"] same
aaaaa + aabab => ["aabab", "aaaaa"] same
aaaaa + aabba => ["aa", "bba", "aaaaa"] diff
aaaaa + aabbb => ["aa", "bbb", "aaaaa"] diff
aaaaa + abaaa => ["abaaa", "aaaaa"] same
aaaaa + abaab => ["abaab", "aaaaa"] same
aaaaa + ababa => ["ababa", "aaaaa"] same
aaaaa + ababb => ["aba", "bb", "aaaaa"] diff
aaaaa + abbaa => ["a", "bbaa", "aaaaa"] diff
aaaaa + abbab => ["a", "bbab", "aaaaa"] diff
aaaaa + abbba => ["a", "bbba", "aaaaa"] diff
aaaaa + abbbb => ["a", "bbbb", "aaaaaa"] diff
aaaaa + baaaa => ["baaaa", "aaaaa"] same
aaaaa + baaab => ["baaab", "aaaaa"] same
aaaaa + baaba => ["baaba", "aaaaa"] same
aaaaa + baabb => ["baa", "bb", "aaaaa"] diff
aaaaa + babaa => ["babaa", "aaaaa"] same
aaaaa + babab => ["babab", "bbaaba"] diff
aaaaa + babba => ["ba", "bba", "aaaaa"] diff
aaaaa + babbb => ["ba", "bbb", "aaaaa"] diff
aaaaa + bbaaa => ["bbaaa", "aaaaa"] same
aaaaa + bbaab => ["bbaab", "aaaaa"] same
# (... and so on)

Amusingly, I was about to write a little thing that pointed out two copies of aaabb acting on one another will produce ["aaa", "aaa", "bb", "bb"], because that’s a good check to have run… but of course I’m using the Array#combination function here, and that doesn’t do repetition. But when I check by hand, yes: that works too.

Anyway, you can see the same and diff notes next to each reaction reported, and when I tally those I see:

same : 279, diff: 217

So, of the \(32^2\) possible interactions between strings of length five, there are \(32\) I forgot to check (the self-interactions), and a little less than half produce a detectable change in the interacting strings. I suppose it’s feasible that some of them may be “changed” to identical strings… but I don’t think so?

Maybe you can check for me.

Hints and allegations

So you’ll recall that when I sketched this problem, I asked what “you see” when you start with only one copy of each of the 32 strings of length five.

As I thought, it’s very doubtful for this Chemistry Table of six rules that you’ll ever “settle down” into a situation where no collisions between strings produce a change. Super doubtful.

What I did to explore this (not shown) was to just bump the strings together randomly in pairs, deleting the bumped ones and replacing them in the “pool” with the products of the reaction between them. Which, of course, are the same strings when nothing happens, but you knew that, right?

The six-rule Chemistry Table1 tends to make a lot of long strings over time, and an awful lot of little chunks. But because of the sampling error when there is only one copy of each of the 32 5-character strings, there are at least two or three qualitatively different “directions” things can go.

And that’s fine.

If you have the patience, you may find it interesting to explore the way these dynamics stabilize as you increase the number of copies of each string initially. If you run a simulation starting from 1, 10 and 100 copies, you’ll see things start to “smooth out” and become more “chemistry-like”. Of course it will take many more collisions between strings in the latter cases, since when you’ve only got one of everything, any reaction that consumes one mean it’s possibly gone forever. In the case with 100 copies of each string, every collision and reaction can at most change the “concentration” of a string by about 1%.

The thing to think about here is the effect of simulation dynamics and path-dependence on your results. The situation when there is only one copy is very different in “meaning” from the situation where there are 10000 of each, don’t you think?

Say yes now.

A lot happening

I also asked what if there are “qualitative” differences in the way the string concentration dynamics unfold, when you compare the full set of six rules, and the six variations with only five rules.

The answer is: absolutely. But it’s also an interesting challenge for visualization, and really one of the better aspects of these systems as far as I’m concerned. How do you see what’s happening, if there’s an almost-unbounded number of “things” in the system?

Wait, why do I say “almost-unbounded”?

Well, notice something about the rules: All of the substitution rules replace the substrate pattern with a longer product pattern. Strings tend to grow as a result. There are only two cutting rules, but they affect shorter substrate patterns than the substitution rules, and therefore you might think they’re more common. And that would be the case, if all we were talking about was the probability of various outcomes in some theoretical stead-state, with an effectively infinite number of copies of everything in the “soup” all at the same time.

But you’ll see that some reactions are constantly adding new strings that haven’t been present before, and (potentially) deleting old ones.

I ran the little function I showed above counting the number of changes made when I examine all possible interations between pairs of different strings, and when each rule was deleted from the set of six. For the set of all possible strings of length 5, this is what I see when each of the six rules is “turned off”:

all     same : 726,  diff: 1290
-r1     same : 792,  diff: 1224
-r2     same : 848,  diff: 1168
-r3     same : 787,  diff: 1229
-r4     same : 750,  diff: 1266
-r5     same : 1087, diff: 929
-r6     same : 1176, diff: 840

Basically what seems to be happening here is something to do with the relative frequency of the catalyst and substrate patterns in random strings. You can see a large effect when r6 is turned off, which I suspect is more or less “the chance of seeing aaa and abb in two strings”. But even in this little six-rule Chemistry Table… it’s not that simple. Notice that those three-character patterns overlap other patterns that earlier rules recognize.

Like I said: it’s worth thinking creatively about how to visualize what’s happening. Bar charts, for example… except the number of kinds of string increases over time, and the number of copies of each kind is probably getting wildly out of balance. How might you visualize the “space” of all possible strings, as a sort of abstract graph maybe, and color things as the strings move around?

I also asked what you observe what happens when you start with one copy of each 10-character string.

Heh. To be honest, maybe you should work up to that, you know what I mean?

Here’s what I observed when I tallied up the number of possible interactions for “just” the 256 8-character strings:

all     same : 4174,  diff: 28466
-r1     same : 5053,  diff: 27587
-r2     same : 6421,  diff: 26219
-r3     same : 4748,  diff: 27892
-r4     same : 4608,  diff: 28032
-r5     same : 8239,  diff: 24401
-r6     same : 11097, diff: 21543

Notice anything different, besides the fact that there are about 32k pairs now? No? How about if we look at the relative proportion of collisions that change the strings?

        5-char    8-char
all     0.640     0.872
-r1     0.607     0.845
-r2     0.579     0.803
-r3     0.610     0.854
-r4     0.628     0.859
-r5     0.461     0.748
-r6     0.417     0.660

Now you see it? The raw likelihood that a pair of strings of length 8 will change one another (in some way) is much higher than the chance for 5-character strings. So there’s going to be a lot going on when you get to 10-character strings. Some of that change is also inevitably going to be making even longer strings… and so forth.

See where this leads?

Interesting, isn’t it? How’s it feel to be playing with a truly open-ended system? Weird, compared to the sorts of computation we normally deal with. There is no “halting problem” here, because it’s absolutely obvious that there’s going to be an eventual explosion.

Transients and steady state

The interesting thing in these systems isn’t so much what happens when they unfold with all the “free energy” we’ve given them, but what happens in more “realistic” conditions.

You’ll recall that I named the “reaction vessel” a “Virtual Chemostat”. A chemostat isn’t a closed system: stuff flows in, and stuff flows out. The idea (in the real-world chemostat) is to provide nutrients to organisms growing in the vessel, and remove their waste products—including the dead ones.

To make this system a lot more like the “traditional Artificial Life” systems on which it’s based, let’s make two minor changes and see what happens.

Let’s build a chemostat of interacting strings, still. We’ll still start with the 32 5-character strings in the soup, and bump strings against one another randomly, with uniform probability (taking into account how many copies of each there are, of course).

But let’s add two additional rules.

  1. Whenever we finish a simulated reaction and end up with more than 10000 strings, we will delete strings at random, with uniform probability, until we get down to 10000 max, with this exception:
  2. The number of copies of our initial “seed” strings will never fall below the starting count.

Say we start with 100 copies of each of the 32 5-character strings. We bump them together randomly (and their reaction products) until we reach 10000 strings total—but as we’re doing it, we never permit the count of any of those strings to fall below 100 instances. They can go above 100, but never below.

And as soon as a reaction creates some 10001th string, we’ll pick one at random and delete it, making sure that the number of copies of each 5-character string never falls below 100.

Now… something very different is going on, isn’t it?

What do you see? You’re welcome to fiddle with the numbers, insofar as your computer infrastructure wants them to be smaller or bigger. Maybe you can see the way towards a “more continuous” approach now, too?

One thing I know: My question about qualitatively different “outcomes” for different subsets of the Chemistry Table makes a lot more sense now. In this more “realistic” chemostat, there will be an initial period we might want to call the “transient”, and then at some point after the outflow process begins and there’s a reduction in some of the “weird stuff” happening, the system will approach a “steady state”.

The nature of that steady state will have changed, given consistent settings for the other parameters. It’ll actually be sort of steady, for one thing.

Think of the “seed” as a kind of “nutrient” we’re piping into the Virtual Chemostat, and as the system dynamics build all kinds of exotic new strings we’re letting them pile up and interact with one another… but only until things start to get crowded. Then the effluent will start washing away the random big-ass strings like bababababbaaaaaaaaaaaaaaaa..., which you’ll no doubt start to see.

And now we’re playing with something that’s way closer to modern experimental Artificial Life systems. If we wanted to be even more “realistic” (in a kind of skeuomorphic way that’s unpopular these days), we might change the way our inflow and outflow work, for instance maybe we could add one copy of each of the 32 5-character strings every “tick” before a reaction, and delete a random 1/1000 of all the molecules present after the reaction. Or something like that.

The details, in this case, aren’t as interesting as the fundamental Design Pattern: We’re building a system that does all kinds of cool emergent junk.

But you and I still have the significant challenge of visualizing what’s happening.

Let’s talk about that soon. See what you can come up with.

  1. I’d call it a grammar, except there are some historical reasons that word may not be the most appropriate here. Mainly these involve the fact that it’s stochastic, and that there are so many constraining rules regarding the order of application of rules and so forth, that it’s easier to fudge and skip the extra twenty words that would have to modify “grammar” for the word to be technically accurate in context. So: “Chemistry Table”!