Notes on rewriting real numbers
Draft of 2019.04.27
May include: rewriting systems ↗ mathematics ↖ &c.
I seem to be thinking about infinite sequences a lot lately. We glanced at some little bits and bobs of Real Analysis in my high school and college mathematics, but I realize that I’m nowhere near “trained” in these matters. I wouldn’t know which one is the hole in the ground, for example.
Which is just to say: these are, as always, just my notes of musings and pokings at a field that’s almost certainly lively and full of wellconsidered material we should find together.
We’re all used to the decimal representation scheme for real numbers, even when the number ends up being infinitely long in that scheme. For instance, most of us are happy enough to see written out as
or written out like
on the understanding that in the first case the digits repeat, and in the second that they do not, and that quite a bit of “stuff” happens offstage to the right.
I have no doubt that is probably the most familiar example of an irrational number; for interesting historical and sociological reasons it’s probably also the most talkedabout. There are a panoply of algorithmic ways of producing the digits of the decimal expansion, some straightforward and some exotic, some elegant and others counterintuitive to the point of being magic tricks.
There are other familiar faces on the superteam of wellknown irrationals: , (aka “Euler’s number”), (the Golden Ratio), and so on. There’s a sizable winner’s circle (often having to do with circles in one way or another), and a lot of lessfamiliar close calls like or that seem to come up in calculations now and then.
There are little (but still infinite) subsets of irrationals that deserve their own special names, like trigonometric numbers, the algebraic numbers, the transcendental numbers, the periods, and more.
These bunches of irrational numbers are mainly named, and the numbers they contain are proven to be members or not, according to the ways in which they can be generated.
Of the reals^{1}, the rationals are that subset which can be represented as a fraction with finite integer numerator and denominator, which includes the integers of course. The proof that some particular number is rational or irrational boils down to a proof that there is or is not a pair of integers that can be the numerator and denominator of such a fraction, whether by producing those integers, or a more roundabout path.
Of the reals, the algebraic numbers are those which are solutions to polynomials (zeroes of functions with rational constants, addition, subtraction, and exponentiation with integer powers). These include the integers, and the rationals, and some irrational numbers like and so on. The proof that some particular number is algebraic or transcendental boils down to a proof that there is or is not a way of producing it by solving such a polynomial, or that it is a member of a betterknown subset of the algebraic numbers, or perhaps by some more roundabout path.
As a sideeffect of what we mean by standard numerical notation, there’s another heuristic we can use to easily differentiate a rational from an irrational number. In standard notation, the fractional digits—the ones to the right of the decimal point—of any rational number will end up in a periodic attractor.
This can be a trivial attractor, too. I can write the number as , but this “really means” something more like “”, which can be read to mean “an infinite string of repeated zeroes to the right”. Whether I write as or , or as “” or , or as , in each case we reach a permanent cycle of digits at the right “end” of the fractional part, for all rational numbers.
For irrational numbers, this doesn’t happen. We cannot write or or as a decimal value that has a repeating sequence of any finite length to the right of the decimal.
Nonperiodic rewriting systems
For some of the most computery people here, the ProuhetThueMorse constant—or possibly the ThueMorse sequence that produces it—may also be familiar.
I first came across it decades ago in a book about Lindenmayer Systems, so let’s start there.
Suppose I have a string, I’ll call it the “seed”, and in this case let’s say it contains a single digit "0"
.
I also have a set containing some substitution rules, which I will apply to every matching pattern in a string, synchronously. For the ThueMorse Lsystem, here are those rules, where I’ve shown the lefthand “match string” in the syntax of a familiar regular expression, and the righthand side as the specific string that replaces the matched pattern(s):
a: /0/ > "01" b: /1/ > "10"
To make things clearer later on, I’ve given the rules unique names. In order to indicate where substitutions are happening I’ll break the application of the rules into two phases, one for patternmatching where I show where the lefthand sides have been recognized by each rule, and another by rewriting where we substitute in the righthand sides.
step string 0 0 a 1 01 ab 2 0110 abba 3 01101001 abbabaab 4 0110100110010110 abbabaabbaababba 5 01101001100101101001011001101001 ...
In each step, I’ve found every matching pattern from a rule—and in this example, every character in the string is matched by some pattern, though that will not always be true—and replaced it with the name of the rule that “caught” it. And then I’ve replaced each rule name with the righthand side of the rule. The twophase steps “grow” a string, which you can already tell will be infinitely long at some point.
That’s the ThueMorse sequence. By a quirk of the rules’ design, it looks as though it’s only “growing” at the righthand end, but in fact each character is doubling in each step. Here I’ve hidden the little intermediate steps to highlight that:
step string 0 0 1 01 2 0110 3 01101001 4 0110100110010110 5 01101001100101101001011001101001 ...
But be aware that what’s “really” happening should look more like this in your head:
step string 0 0 1 0 1 2 0 1 1 0 3 0 1 1 0 1 0 0 1 4 0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0 5 01101001100101101001011001101001 ...
In any case, what we get is a “growing” sequence of characters. If we take those characters as the digits of a (binary) fractional number in base 2 notation, we get a series of approximations that gradually approach of the ProuhetThueMorse constant:
step decimal binary 0 0.0 0.0 1 0.25 0.01 2 0.375 0.0110 3 0.41015625 0.01101001 4 0.412445068359375 0.0110100110010110 5 0.41245403350330889225 0.01101001100101101001011001101001 ... ... ...
This constant is irrational, and also transcendental, and it’s also interesting for a variety of reasons that don’t concern us here. What I want to do here is focus on that algorithmic process—the Lsystem one.
More substitution rules
I’ve written before about this kind of substitution system, but let me present a few more variants on the simple contextless rules I’ve been using so far.
The match patterns can have wildcards, just like [most] regular expression matchers:
a: /0(.)0/ > "010" b: /0+/ > "9" c: /0[14]/ > "05"
Rule a
replaces any single character that appears between two (adjacent) zeroes with a "1"
.
Rule b
replaces any string of one or more consecutive zeroes with a single "9"
.
Rule c
replaces any of the patterns in {01
,02
,03
,04
} with the specific string "05"
.
We can grow or shrink the patterns we match:
a: /0123/ > "1" b: /456/ > "445566" c: /7/ > ""
Rule a
replaces an occurrence of the exact string 0123
with a single "1"
.
Rule b
replaces the exact string 456
with "445566"
.
Rule c
deletes the character 7
.
We can “capture” parts of the matched pattern:
a: /0(.)(\1)/ > "1\1\1" b: /4(5+)6/ > "\146" c: /(.)(.)\2\1/ > "\2\1\2\2\1\2"
Rule a
matches any string of three characters that starts with a zero, followed by two copies of the same character, and replaces the zero with a "1"
(leaving the following digits unchanged).
Rule b
matches a string of one or more 5
digits, preceded by a 4
and followed by a 6
, and moves the 4
to the indicated position just after the string of fives.
Rule c
matches a palindromic sequence of the form abba
and replaces it with the same characters in the sequence babbab
. So for example 8778
would be replaced by 787787
, and so on.
You may already be playing with these rules, and might have noticed that there is an issue with conflicting rules in the schemes I’ve laid out. What about a grammar like this one?
a: /0/ > "01" b: /01/ > "11" c: /10/ > "010"
Given the string "010"
, which rule(s) do we apply, and where? Does a
recognize the first zero, or does b
recognize that and the following one?
And how about this one?
a: /0/ > "01" b: /0/ > "11"
Here, the same exact pattern is explicitly recognized by both rules. How do we discriminate or prioritize?
There are plenty of ways to disambiguate, but the one I’ll use here is sitting right in front of our faces: We’ll use the first matching pattern in the order the rules appear in the ruleset, and when a pattern is matched it’s immediately replaced by a placeholder that removes it from further consideration. So for example in the case of
a: /0/ > "01" b: /01/ > "11" c: /10/ > "010"
we will never apply rule b
, but rule c
can pop in and “use” a zero that would fall to rule a
in some other context. We just need to make a more detailed “transactional” process. Here, watch:
step string 0 010 a10 ac 1 01010 a1010 ac10 acc 2 01010010 a1010010 ac10010 acc010 acca10 accac 3 010100100010 ...
See how that plays out? In each line I’ve consumed the first matching pattern (if any) that I find in the ruleset, and then when I’ve run through the entire string I do the substitutions themselves all at once. Using a metarule like that, I suppose one could handle any arbitrary list of rules. Rather than applying every one, everywhere it could be done, all at the same time, we can incrementally apply them throughout the string.
OK, so sure if the string is infinite, then there has to be an assumption that “incrementally” means “pretty much all at once, but with precedence rules”, but you get the picture.
OK so let’s fuck around with π
For example, here’s a onerule ruleset that converts into a rational result.
a: /0[28]/ > "1"
I’m using the sortakinda regular expression syntax here to indicate that “every digit except 1
is turned into a 1
”. When I apply this rule (in parallel) to , what happens?
step string 0 3.14159265358979323846264338327950284197169399375... a.1a1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aa1aaaaaaaa... 1 1.11111111111111111111111111111111111111111111111... ...
Well, it looks as if it turns into the decimal representation of , and then stays there for all subsequent iterations… but I’ve fudged a little here haven’t I? I’ve ignored the infinite sequence of preceding zeroes to the left of any finite real number in this representation, so I suppose to be more formally correct I should read this to mean:
step string 0 ...0003.14159265358979323846264338327950284197169399375... ...aaaa.1a1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aa1aaaaaaaa... 1 ...1111.11111111111111111111111111111111111111111111111... ...
and that, I admit, isn’t nearly as satisfying to see.
I could do the same trick in a slightly different way, though, and get the desired result… though perhaps with a different kind of “fudging”:
a: /0[29]/ > ""
which will do this:
step string 0 ...0003.14159265358979323846264338327950284197169399375... ...aaaa.1a1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aa1aaaaaaaa... 1 ...0000.1111...
If I stop there (after one step) and as long as I add a weirdsounding heuristic like “zeroes to the left of the leftmost digit of the integer part are replenished as needed”, it looks like I’ve “converted” into the decimal representation of .
Making the assumption that is a normal number, and with a little more syntax in the Lsystem definitions, I should be able to play similar tricks to produce other rational numbers:
a: /4/ > "X" b: /[03,59]/ > "" c: /X/ > "142857"
Two round of this ruleset (plus the heuristic that permits leftfilling zeroes to appear) show us that I’ve converted into :
step string 0 ...0003.14159265358979323846264338327950284197169399375... ...bbbb.babbbbbbbbbbbbbbbbabbbabbbbbbbbbbbabbbbbbbbbbbb... 1 ...0.XXXX... ...b.cccc... 2 ...0.142857142857142857142857...
Arguably the “fill to the left with zeroes” heuristic has a companion “fill to the right of the last nonzero digit with zeroes”. We can convert (and if you think about it, every other real number) into zero with a similar approach:
a: /09/ > ""
This rule immediate converts any decimal number to a lonely little decimal point, which gets backfilled in both directions to be ...000.000...
.
What a boring function, huh? And yet, it might be useful in some circumstances.
Here’s an interesting one:
a: /0/ > "0" b: /1/ > "0" c: /2/ > "0" d: /3/ > "0" e: /4/ > "0" f: /5/ > "1" g: /6/ > "1" h: /7/ > "1" i: /8/ > "1" j: /9/ > "1"
step string 0 ...0003.14159265358979323846264338327950284197169399375... ...0000.00011011011111000101010001001110010011011011000...
I’m going to suggest this is probably an irrational number, as long as there is no point in the decimal expansion of where the digits irreversibly start to be 4
or lower.
Then there are the “echoes” of , like
a: /0/ > "00" b: /1/ > "11" c: /2/ > "22" d: /3/ > "33" e: /4/ > "44" f: /5/ > "55" g: /6/ > "66" h: /7/ > "77" i: /8/ > "88" j: /9/ > "99"
step string 0 ...0003.141592653589793... ...aaad.bebfjcgfdfijhjd... 1 ...00033.114411559922665533558899779933...
and so on…
Indeed, there seems to be a lot of leeway here. It feels as though we can convert an irrational number like into a rational one by certain wellplanned rulesets, though it may not be simple or even possible in the general case. But it’s certainly true that we can substitute globally in the infinite sequence of digits and produce a variety of other derived irrational numbers, as well. And while we could keep reapplying a ruleset forever, we don’t seem to need to do so in order to change from irrational to rational values.
Maybe the other way we do, as in the case of the ThueMorse sequence. There, we’re building a series of rational numbers that gradually approximate the real constant in the limit.
Those difficult problems I threatened
So here are some open questions:
Is there a finite set of rules, applied a finite number of times (but possibly more than once) which can transform into ? That is, can we make the decimal string "3.14159..."
into "2.71828..."
, with absolute precision?
That seems so unlikely that it’s intriguing, if you ask me. I mean… no, surely not.
Well, OK, how about this one:
Is there a finite set of rules, applied an infinite number of times, which can transform into ? That is, can we make the decimal string "3.14159..."
into "2.71828..."
, with absolute precision, as the limit of an iterative global replacement process like the one used to produce the ThueMorse sequence?
Hmm…. Maybe? That’s an odd one.
But one more:
Is there a infinite set of rules, each with a finite match pattern and substitution string, which can be applied a finite number of times, which can transform into ?
You may notice that I haven’t mentioned a finite set of rules with infinite match patterns… because hey that’s trivial! Consider this superdooper cheaty one:
a: /3.1459.../ > "2.71828..."
That’s trivial. One wonders if there are less… heavyhanded variations though.
There is a different facet of these substitutionfunctions. What does one look like, applied to (for instance) the domain of the reals?
For example, take the “echoes” rule I mentioned above:
a: /0/ > "00" b: /1/ > "11" c: /2/ > "22" d: /3/ > "33" e: /4/ > "44" f: /5/ > "55" g: /6/ > "66" h: /7/ > "77" i: /8/ > "88" j: /9/ > "99"
What does that function “look like” when plotted for all ?
Related and probably worth mentioning
It’s commonly understood that is a normal number. That is, all combinations of digits appear somewhere in the expansion.
The first and most accurate approximation of appears in the decimal expansion of the number, starting from the first position. That is, "3.14159..."
is the number, and so it’s the best possible approximation.
But we can assume that somewhere later on there is a sequence 3141
, and somewhere a sequence 314159
, and so on. How many contiguous digits of might appear in itself?
Considering every contiguous substring of the decimal expansion of , we can measure its “latency” (distance from the start of the fractional part) and its “error” (absolute difference from the actual value of ).
So for example, here is a little table of a few fourdigit substrings:
3.141592653589793... latency error = (π/10)(i/10000) 1415 1 0.172659... 4159 2 0.101741... 1592 3 0.154959... 5926 4 0.278441... 9265 5 0.612341... 2653 6 0.048859... 6535 7 0.339341... 5358 8 0.221641... 3589 9 0.044741... 5897 10 0.275541... 8979 11 0.583741... 9793 12 0.665141... ....
You can see that among these first few fourdigit substrings, 4159
, 2653
and 3589
seem to be the mutuallynondominated ones. That is, no other in this brief list is both closer to the start, and numerically closer to 31415...
.
What substrings of the fractional part of (in base 10 notation, for now) are mutually nondominated with regards to latency and error? Notice that here I’m only talking about the fractional part, to the right of the decimal point, and so what I mean more specifically is the absolute distance between and the ratio of the integer value of a given substring of consecutive digits, and . So for example, the substring 5358
, with four digits, produces an “estimate” of or , which is then compared to to produce the absolute error measure.
This provokes a glib followon question or two:
What is the largest sequence of digits that reproduces which appear in the fractional part of itself?
And also:
How much of you find in ? In ?

Everything I’m writing here applies to the more general collection of complex numbers as well; I’m simplifying (probably unnecessarily) to get to a point. ↩