But you can choose your neighbors

Draft of 2016.10.14

I was paging through 25-year-old old notebooks, from back in the days when I was nominally a Biochemistry and/or Complexology student. I can see why it worked out so badly for me in the Academy: the wet lab notebooks are filled in the margins with little combinatorics problems and diagrams of dynamical systems designs, and the complex systems notebooks are filled with little questions about prospects for laboratory experiments we never had the funding the pursue. I’ve always had the bad habit of talking about some other discipline when I’m supposed to be focused in one.

Maybe it’s a charitable tendency on my part. Being interdisciplinary lets more people, in more professional fields, think of your work as off-topic, and that makes them feel better about themselves. I’ve always been nice that way. Even nowadays when I’m writing code, there’s that Beginner’s Mind thing I do, where I work as if I don’t know anything about the problem at hand (even when I’ve solved it before). I can attest that’s a fast way to make “professional” coders frown and think you’re an idiot, and many of them love thinking folks are idiots.

He said in a nonchalant and level voice, never even glancing towards their code….

Anyway, in the margins of a lab notebook where I was (it turned out) waiting for my Hosta sieboldiana chloroplast DNA extraction to be chopped into random garbage by contaminated restriction enzyme buffers, I find a drawing of lattice proteins, and I even vaguely a little bit can remember what I was sketching at the time.

The Wikipedia article on lattice proteins was obviously written by somebody who isn’t interdisciplinary enough—and yes, I know, I should “fix it” if I think that; for now, let me say in passing that it’s criminal that there’s no pictures. Even if you’re a biochemist, it won’t help you much.

There are better visualizations and explanations at Ken Dill’s lab page (since he’s one of the main inventors of the sort of abstraction we’ll be looking at here). And here’s a foundational paper that I can see for free online; there are earlier references, but they’re all paywalled because everybody knows esoteric theoretical chemistry papers from 25 or more years ago are a valuable intellectual resource. And also there’s this lovely little dojo-style two-day exploration I stumbled upon just now, by Louis Maddox.

It’s funny how little known lattice protein stuff is, outside of theoretical biochemistry. I suppose that’s down to Abbott’s System of Professions (again), but it may also be because the more abstracted complexological approach the HP models (and simpler ones) represent is out of favor a bit. I expect Big Data likes to imagine that Made Up Data (complexological statistical mechanics) is nice as a sort of acceptance test, but not, you know, realistic enough.

Don’t get me started on the inherent irreality of data. Not now.

Anyway, I’m all about the Made Up Data. I’ll start with simple lattice polymers like those in the foundational papers, and then we’ll abstract them and make them way less biological and way more interesting.

Beads on a String on a Chessboard

Imagine I have a string of beads. They’re solid little things, connected by a sort of special “chain” linkage that only permits them the “backbone” to adopt 90° bends. To simplify things (for now), I’m going to leave the string of beads on this table here, which you will note is conveniently covered with a checkered tablecloth. You know how this turns out already: “Hey look!” I say, “In an unlikely coincidence, the distance between each bead exactly matches the size of the tablecloth squares!”

“Puzzle people,” you mutter, but decide to bear with me.

So then I demonstrate how I can lay out the chain of beads in a straight line, like this:


and how I can twist it around like this (where I’m drawing the connections just to make the chain more obvious):

 …-F A-B
   |   |

And so forth. “Yay,” you mutter, seeing where this is going already, “Are we going to be counting self-avoiding random walks? I love difficult counting problems where the answer can be looked up easily but you ask us to count them anyway, Tozier. Let’s do another one of those. Hurrah.”

I pause, and look you in the eye as I continue. “Yes, it’s a self-avoiding random walk on a lattice. An arbitrary lattice polymer can be arranged in any way so that the distance between adjacent letters in the chain is always one horizontal or vertical step, and so that only one letter is in a square on the checkerboard. But—”

It takes a strong will not to put your hand on your face whenever I say “but” that way.

“But! There is also an interactive force between the beads on the chain. Some of them want to be close to one another, and some of them want to be far away from others.”

The HP model: not the point

So in the HP models developed in the 1980s, we can think about the chain of beads as being of amino acid residues in a polypeptide chain. Suppose there are only two types of amino acid present: an H (hydrophobic) residue/bead/letter, and a P (polar) residue/bead/letter. For any given configuration of a chain—which is to say, for any self-avoiding random walk on the lattice—every adjacent pair of HH letters, whether connected on the backbone or just adjacent on the lattice, contributes a “goodness point”. Any PH or PP pair provides no goodness.

We can imagine this snaky bead-chain on the table as being a momentary configuration of a polymer in solution, and every fraction of a second the chain “wiggles” a bit, within the scope of its geometric constraints. A stretched-out linear chain has the minimum goodness, but any configuration that brings more H beads next to one another on the lattice will have more goodness. The configuration(s) that bring all the H residues together in a compact square or blob, surrounded by P residues, will probably have close to maximum goodness.

By now you can imagine that “goodness” is something about configuration energy. And that the dynamical process of an extended chain “crumpling up” into a compact form is something like the process of protein folding in solution. And if you’ve ever been exposed to modern biophysics or biochemistry you might be connecting the dots and realizing that the suite of all possible configurations of the chain on the tablecloth is necessary but not sufficient to define an energy funnel for folding up bead-strings.

The “goodness” measure isn’t sufficient to define the funnel for this system, because a set of possible configurations is merely a set. To define a landscape (or “funnel”, for the upside-down energy people, like me), you need two things: (1) a means of assigning a scalar value to each configuration, which is our “goodness” of HH adjacencies, and (2) one or more moves for changing one configuration into its “neighbors”. Uniform guessing is actually a move, and in fact there is no worse one available. A lot of people with practical experience in real-world macroscopic systems will probably want a move that’s more like “make the tiniest possible incremental change”, because we’re used to dealing with situations that aren’t multiscale.

But if you imagine these little lattice polymers as being snapshots of a dilute solution of physical objects, taken in aggregate, wiggling around randomly in some kind of solution (because temperature), you can imagine that in general they will tend to “fold up”. Because of the randomness of things, they probably won’t all fold up to optimal configurations with maximum goodness. And you don’t want to think about what happens if they’re not in dilute solution, and they start folding up around each other, but I think you can grasp the complexological notion here: There are many (technically, a shit ton) of possible configurations for any reasonably long chain, and very few kinds of interaction defined (just one!), but nonetheless all kinds of complicated dynamics ensue.

It’s not what really happens. But as a sort of stylized fact it captures and isolates a little bit of the real-ish parts of the world. If you tend towards Physics, you might be tempted to draw histograms of the various energy levels of different configurations, or explore the effects of adding a bit more or less random noise to the “moves” you’ve chosen, and see whether temperature stability is an emergent property of these simple polymers. You might be tempted to think in terms of the Shannon entropy of the sequence, and its relation to the folding dynamics of chains.

To pursue any of that, you’ll need to decide on which moves these chains can make, transforming from one configuration to the next.

First, a different model

We’re going to talk about the effect of moves shortly. First, though I want to change things up a bit.

The sketch in the margins of my (failed) restriction digestion notebook page wasn’t about the HP model. It was about branched polymers, not single chains.

Here’s some notation, of the sort you might use to describe an L-system.

: H-P-H-P-P-H-H-P-H-P-P-H-P-H-H-P-P-H-P-H

: H-H-P-P-H
  |     |
  |     H-P
    | |
    | P

At every point where there’s a square bracket, a new branch is begun from the chain. In order to lay them out on the character grid above, I’ve made some of the connections longer, but in our lattice polymer model, these all have to be adjacent squares.

Already I think you can see there could be problems with our fundamental self-avoiding lattice constraint. The notation permits arbitrary branching, but there can only ever be four immediate neighbors of any residue in the polymer. So there must be an additional rule, if I’m to define feasible branching structures: no residue can have more than four neighbors: two branches, if it’s interior to a chain, or three if it’s on an end.

:   P-P

Interestingly, the “end” is hard to think of as the “end” when a terminal residue has even one branch. So there’s something going on here that makes me want to start talking about normalized representations and equivalence relations. But seeing your hand on your face, I will forbear.

Let’s just say the rule is “no residue can have more than four connected neighbors”, and hope for the best.

That said, there’s another somewhat subtler constraint that carries over from the self-avoiding walk thing. I can imagine there are branching chains that can no longer fit in the space of a lattice. Consider something like this one:

:       H
      | | |
      | | |

That third layer of H branches is too crowded, and inevitably the neighbors of the north P will interfere the neighbors of the west and east P. I’ve shown those as X. This is not allowed by the self-avoiding part of our lattice rules. This polymer isn’t even feasible, given our constraints; there’s no way to place all the pieces on the checkerboard.

Ignore the fact that the residues are still P and H. Can you come up with some general-purpose rules we can use to tell, quickly and easily, if a particular branching polymer in HH[H][H]HH square-bracket nomenclature is valid or not? Obviously “not having more than four neighbors” is one rule. What kind of algorithmic implementation of the self-avoiding constraint-checking rule can you write, using the string representation as an argument? In other words, do you need to simulate the layout somehow to check, or can you use a simpler heuristic, like “no nested branches deeper than 3”?

Is “no nested branches deeper than three” specific enough? Or is it something more about the distance between branch points?

Play with that for a while. It’s important and interesting, and might even be worth your time to puzzle over. What are the explicit rules for these branching monsters which emerge from the fundamental constraint of self-avoiding square lattices?

In part two of this two-parter, we’ll talk about various types of move, by which one feasible layout can be transformed to another. Because we all want to have happy little polymers, don’t we?