Draft

Research and recreation and lattice proteins

Draft of 2016.11.23

May include: visualizationprogramming&c.

This is a another in my ongoing series of explorations and sketches of lattice proteins. [first, second]

As I said at the start, I’m looking over some 25-year-old notebooks from my days as a theoretical biochemist. For some reason, unlike L-systems and cellular automata and other familiar simulation systems, lattice protein models never made to “jump” from biochemistry and biophysics to “mainstream” complex systems toys, for reasons I’ll try to address in my later long-form thing… but here’s my increasingly Abbottian hypothesis:

Lattice proteins—unlike Wolfram’ linear cellular automata, and many other popular “complexity standards”, like fractals and chaotic maps and so on—were proposed and investigated in the context of explaining a real complex process occurring in nature: protein folding. The “turf” being colonized by the disciplines of Biochemistry and Biophysics with lattice protein models had only recently been “opened up” by the development of fast digital computing, and the reachable “payoff” for that original work was the formulation of some core notions in modern Biophysics: the ideas of folding funnels and energy landscapes turned out to be immensely fruitful in context. Which is to say: compared to trying to write papers about the immensely complicated long-chain biopolymers in poorly-understood thermal solutions, systems with literally millions of interacting atoms, researchers in the discipline were able with lattice protein models to write about substantially more tractable stylized systems.

In other words, as a Biochemist you can work with these highly stylized lattice proteins and actually do interesting new work in the discipline-appropriate fields of thermodynamics, entropy, multi-scale solution dynamics, ensemble behavior and so forth. These are all perfectly decent in-discipline publishing material. As a Biochemist, one need only be careful in programming one’s simulations that no unnatural biases or unrealistic jankiness get built in along the way; sure, everybody in the field understands that real proteins are much more complicated, and don’t fall on a lattice, and that chemistry is greatly simplified. But those are assumptions that keep you “on-topic”.

If anybody ever stumbled across any biases or jankiness as they worked their projects within the originating disciplines, those might have warranted a mention as something to avoid. Only in passing in the comments of paper reporting unbiased, non-janky successes. But nobody would waste time looking into an unrealistic variant of these models.

And that, I think, might be why lattice proteins never made the “jump” to mainstream Complex Systems work. In just a few days playing with these models I’m seeing all kinds of fascinating biases and emergent behavior we can explore. The 25-year-old notes I discovered, in which I was drawing on graph paper by hand and didn’t have the skills or tech to write anything in code, indicate that even back then I was thinking about these off-label “uses” of lattice protein models.

For example, here’s my original student’s question, before I was fully indoctrinated into an Academic mindset: “What do these models do if there’s a bias in the chain’s movements?”

Fainting in coils, order for free

Here are two minor variations on the sketch I showed last time.

The representation and dynamics of the chain are similar to the one I showed you last time. The only substantial differences are that (1) there are a lot more “beads” on this chain, and (2) instead of changing a randomly-chosen link to point in a random direction, whenever a link is changed it either bends “left” or “right”, compared to the direction it used to be pointing. In other words, I’m not letting a “random change” include “stay the same”, and I’m skipping some “reversing” moves.

So here’s an unbiased floppy chain, with equal probabilities of any given link bending left or right at each tick of the simulation clock. As before, I’m discarding any attempted change that would result in a collision somewhere along the chain.

For the sake of foreshadowing, let me emphasize that in this sketch, there is an equal likelihood of any given link turning left or right.

Now in the original setting of Biochemistry and Biophysics models, we could get away with saying that an unbiased chain like that is in some sense “realistic” in a stylized sense. I could say, for example, that it’s truly ambivalent about interacting with itself; in this simple form, it has none of the interaction effects you’d add to make it more like the original hydrophobic polar (HP) model. Nothing is “sticky”, no bead “repels” another, the beads never even talk to one another, except insofar as the chain constrains their movements and positions.

You can even use it to make some interesting observations about entropy and molecular dynamics. For instance, I’ve started it here in a sort of folded paper configuration: a bunch of layers with tight switchbacks. I’m confident that in your unique browser experience of its dynamics, it “poofed out” into a sort of sinuous S-shaped curve of some sort. It straightened all by itself, in other words, even though there’s nothing at all in the simulation that makes beads “dislike” being near one another.

That’s an example of entropy in action. There are very few changes, of all the possible changes that could be made to the initial configuration, that result in permitted non-crossing layouts… except the ones that “unfold” the compressed form. This inherent “bias” might bother you, but it’s basically telling the story of the entropic elasticity we observe in regular old polymer chemistry. There’s nothing special about the folded-up paper starting configuration… except that there are far more ways for it to legally unfold than there are for it to legally crumple up even tighter.

By the same argument (and as Ron and I saw in practice yesterday when we were playing with this), when I start the chain off in a perfectly straight line of [:east :east ...] connections, it quite obviously starts to “crumple up” right away. That’s just the same reasonable-sounding stylized fact about thermodynamics: there’s only one perfectly straight eastwards configuration among all the \(4^{229}\) possible configurations of this sketch’s polymer. As we move randomly away from that, the whole will tend to get less straight, and it will of necessity “crumple”.

And again: there’s no particular reason for the straight chain to collapse or the compacted chain to unfurl, except for these entropic effects. It’s good, sound statistical mechanics and entropy modeling.

Complexology viewed as “making mistakes on purpose”

Speaking as a Biochemist-what-was, if I were to try to model natural “real” protein folding using an approach like this lattice polymer model, I’m sure that even if I were ignorant of the decades of prior research in the field, I’d immediately think about all we know about how amino acids interact with one another in solution, and I’d start to think about letting certain “beads” want to be closer to one another than they want to be near other beads. That’s the route that leads to the HP model: a particular configuration that brings two beads that “like” one another into adjacent positions on the lattice will be less likely to pull them apart again with a subsequent move. Over time (in the HP model), as wandering “unbiased” beads bump into one another, the happy ones tend to stick.

And, as we will no doubt see if I implement an HP-like version of this, that works fine. Over time, as random “thermal fluctuations” gradually appose certain beads, they’d agglomerate “naturally” to form a folded protein.

But that’s not the only way to fold one of these polymers.

In this next sketch, the only difference is that in this one there is a ten-fold bias towards a link turning right as opposed to left. As it runs, you’ll see the whole chain swirling clockwise around its attached central tail, as you might expect.

But there’s another phenomenon you’ll also see, when you watch long enough. Soon there will start to be little “knots” forming along the chain. These are filled tight with spiral-packed chains in a sort of “Greek key” arrangement. Even with these knots, though, the chain will still in general keep swirling around… until two knots start to bump into one another. Then those will become “tangled” into a larger knot of some sort.

Ultimately, the chain will have “folded” into a very compact shape.

The same effect is apparent (though much slower) if I introduce even a small bias into the bending dynamics between left and right turns.

If you scroll back up and look at the unbiased sketch, that will almost certainly still be flopping around in a kind of half-collapsed dropped-string kind of layout. It will eventually lose the right-left-right-left folds in which it began the simulation, but it will probably end up being sinuous on about the same scale as it started, even if those bends reverse. (I don’t know why. exactly; more on that below.)

Eventually, the biased simulation will still be running, but the chain will stop moving at all. Maybe a little wiggle now and then, but it will eventually end up “locked” into a spiral it can’t unroll from. The earlier, “unbiased” sketch will still be flopping around in some kind of wet noodle shape.

I didn’t tell it to do that at all, did I? No rule that says “fall down into a ball”. Huh.

So why does the second “wrong” lattice protein fold up by itself, when the more reasonable “realistic” one doesn’t?

Well, see, that’s an interesting question.

You might spend some time thinking about answers in terms of Aristotle’s Four Causes. One of the reasons should feel as though it’s related to the clues I dropped above, when I talked about the “built in” collapse and expansion due to entropy: once the chain gets tied up in a very compact spiral, even locally, there are fewer paths by which it can legally unfold than there are for it to fold more tightly. But that’s not enough of an explanation, by a long shot.

This is, I have to say, an lovely little emergent phenomenon. There’s no rule I’ve written out that tells me that the polymer “wants” to collapse. Unlike the explicit Biochemist’s approach, I’ve added nothing along the lines of “move randomly, but when these things are close don’t take them apart as often as other things come apart”.

There is also suddenly a sense that there are many “adjacent questions” I can’t begin to answer.

clumping?

If you watch, as I have, several collapses of the biased chain (reload the page to restart it), I suspect you’ll start to see patterns in the way the clumps form along the chain. Most often I’m seeing three clumps, maybe at the old “bends” in the original layout… but maybe not.

In Structural Biology classes we’re taught that proteins collapse hierarchically in much the same way: first they clump, then the clumps tend to form globs, and finally the whole thing falls down into a particular native form “fold” that’s relatively stable at physiological temperatures.

But the biased model has no such rule, and yet it tends to form clumps on what appears to be a natural scale. If we were looking at dynamics of something more like the HP model, where some beads “like” being close to each other, then we could make the argument that clumps of beads in the chain tend to form clumps of folded-up chain.

But here’s a way for clumps of chain to form without any explicit causative role being played by the beads themselves.

unreachable configurations?

As I watch the biased sketch, I start to realize that the “clumping” is happening because of changes in the probability of “escaping” certain local chain configurations. A sort of kink-inside-a-loop happens, sometimes, and it’s hard for the biased dynamics to “untie” that without violating the self-avoiding chain constraints.

Just as some configurations happen spontaneously but are hard to unmake, there may be local configurations that are much harder to construct, and which fall apart quickly if they form at all. This right-turning bias makes right-aligned “Greek key” blobs; I bet “left-turning” bias would have a much harder time building a right-aligned “Greek key” blob like those we see here.

What, therefore, are the “unreachable” or “unstable” configurations in the original “unbiased” dynamics? For simplicity, Ron and I picked a very simple kind of move: one chain link changes direction at a time. There’s a sort of common sense feeling—quite possibly unsupported by reality or mathematics—that eventually a chain of any length might reach any configuration by a series of consecutive single-link bends.

Well, yeah. “Eventually” is a tricky thing, isn’t it? So for example, I bet long, straight folded-paper chains like the one I started these sketches in are rare, and hard to reach, and fall apart quickly.

A pretty safe bet, since we’re watching one fall apart quickly right now.

I wonder if there is a small change I could make to the “unbiased” dynamics that might promote that sort of change. For example, I wonder what might happen if I constrained the location of random changes to be more common farther apart on the chain, or in 2d space on the lattice, or in some other arrangement? Or what might happen if we made two bends in the chain at each step, instead of just one? What would happen if instead of picking locations to fold at random, I checked each position along the chain in order, “start” to “end”? What might happen if we folded the chain over, as if it were a wire instead of a string?

more questions than answers?

What would a different lattice do? What about three dimensions, or more? What about branched polymers (which is where I’ll go soon, as you’ll recall). What about other dynamics we haven’t even explored yet, which aren’t “natural” but might be better or worse at finding collapsed states and exploring constrained compressed forms? What about… all the other things we could have done?

“Research” and “Recreations”

I hope you have a sense that there are many more questions like those.

A lot of folks describe Complex Systems research as being interdisciplinary, in the sense that it blends models and techniques from at least one empirical discipline and a mathematical or computational approach that would otherwise be unfamiliar in the original discipline. The social dynamics of this “blending” and the way various academic and professional disciplines approach it are complicated and interesting in their own right, and I hope to write extensively on that some time soon. For now, as I move farther into some of these peripheral questions and away from “biochemistry models”, I want to talk a bit about what I’m doing and why.

Within an academic discipline, professional obligations mean one has to work on culturally appropriate research projects. Over the last few decades, computational approaches have infiltrated most disciplines in Science, the Arts and Humanities, and there have been resulting gradual shifts in the focus of disciplinary research.

“Gradual” because academic research wants above all to be normative. Incrementalism is almost always a kind of tradition-building.

Interestingly, though, the computational infrastructure we—all of us, especially the non-Academic “us”—have developed over the last few decades makes it possible to quickly and easily build things. Programs and simulations and experiments and sketches like those two up there in this essay. Those two were about five hours’ work in total for me, most of which was me reminding myself about finicky details of badly-documented libraries, not actually “writing code”.

As computational projects have become easier to undertake and explore, more of us are able to build complicated (and complex) simulations. That ease helped give rise to the field of Complex Systems… and also many of its excesses. Once you have a suitably “general” model of dynamics, one that produces an interesting emergent phenomenon that’s analogous to some observed real-world dynamical phenomenon, it’s easy to forget that the process of simplification that permitted your experiment might not be entirely reversible.

Sometimes a problem isn’t a fitness landscape.

Some sciences (and many social sciences) learned long ago that the world is a contingent place, and that generalizing too far back up the chain from specific observations is fraught. Other sciences (and some social sciences) saw early successes in the 20th Century from broad, aggregate models that rely on an assumption of homogeneity and a desire for universality: that everything is mixed up real well, everywhere and all the time. The former kind of work likes anecdotes, field observations, anomalies, interactions between different “layers”, long-term nonuniform networks, things that aren’t lumpable into single continuous variables. The latter kind of work likes analytical formulations, differential equations, mean-field theories, assumptions of random initial conditions, eventual consistency and expected values.

For the moment, call these “Well, It Depends” vs “Big Data Finds Better Rules”. How would you describe these little sketches we’re exploring together, on this axis? Probably well over towards “Well, It Depends”, and far away from “Big Data Finds Better Rules”, right?

But maybe that’s not even a fair question. I’m using this dichotomy and axis to talk about research. Are you “doing research” if you write your own lattice protein simulation and watch to see what happens?

I want to say no—but that doesn’t mean I don’t think this work is valuable. Indeed, over the next few essays I want to make the case for why I think it’s more crucial for us now to undertake and share this work, this “play” work that isn’t much like “research” at all.

Call them “recreations”. Or maybe call them “katas”.

By the way, that word “recreations” is (with its cognates) one of the most painful accusations thrown at Complex Systems researchers’ work by traditional disciplines back in the day: Automata and fitness landscapes and network theory and all that combinatorial simulation was greeted by disciplinary academicians when it first opened up new prospects, but very soon entrenched research programs came to call it “play”. And while it might not be a cause-and-effect relationship, I’ve watched over the last few years as the trend in Complex Systems research has been to push strongly over towards “Big Data Finds Better Rules”. Less play, more traditional universals and general-purpose tools. And as a result, some of the facets of the research—especially the stuff that doesn’t depend on context—is much better integrated into mainstream science and engineering practice. Other facets are not even explored these days.

But you and I, we are here to play. These things I’m doing (and the things scattered inspiring colleagues, many I haven’t yet met in person, are also doing in their own work) are absolutely, unquestionably, and unrepentantly recreations.

And I want us to own that word.

Think of it this way: What do you learn when you write your own implementation of Conway’s Game of Life? I bet almost every reader has written a computer program that implements that simple algorithm, and then stared at it for a long time. And the ones that haven’t, who have just been given the link? They probably should go write their own right now.

Why do we do that work, though? The mathematics and statistical mechanics of the Game of Life are well-understood, and have been for decades. There are perfectly good search algorithms that are highly efficient at finding particular “interesting” configurations with specified behavioral properties; there’s software you can download and run to search for glider guns, for example.

What good does it do to spend time writing the Game of Life from scratch?

I said the word kata. That might be a clue to my long game here.

I want you to think about that. I know you’re here because you feel the drive to explore these little things yourself, even if it’s just to the point of watching particular simulations wiggle around on the screen. I want, specifically, for you think introspectively about insight and understanding, and the ways in which you, personally, come to understand the world by watching it unfold, as opposed to reading a stylized academic research report with aggregate statistics in a table, or numbered equations or phase space diagrams.

In these days when academic research as a whole seems to be moving inexorably from “Well, It Depends” towards “Big Data Finds Better Rules”… maybe what we’re doing here isn’t really “research” at all. Maybe it’s good that it’s more recreational, more personal, and a lot more visceral.

Maybe what we’re doing here is learning the rules, as opposed to being told the rules. Or maybe it just depends.

You think on that. Think about how a bias in turning right instead of left can wad up a string into a ball, and how a lack of such a bias has a much harder time of doing so.

I’ll be back soon, with more questions.