Notes on rewriting real numbers

Draft of 2019.04.27

May include: rewriting systems ↘ mathematics ↗ &c.

I seem to be thinking about infinite sequences a lot lately. We glanced at some little bits and bobs of Real Analysis in my high school and college mathematics, but I realize that I’m nowhere near “trained” in these matters. I wouldn’t know which one is the hole in the ground, for example.

Which is just to say: these are, as always, just my notes of musings and pokings at a field that’s almost certainly lively and full of well-considered material we should find together.

We’re all used to the decimal representation scheme for real numbers, even when the number ends up being infinitely long in that scheme. For instance, most of us are happy enough to see \(\frac{1}{3}\) written out as

\[0.33333\dots\]

or \(\pi\) written out like

\[3.14159265358979323846264338327950284197169399375\dots\]

on the understanding that in the first case the digits repeat, and in the second that they do not, and that quite a bit of “stuff” happens offstage to the right.

I have no doubt that \(\pi\) is probably the most familiar example of an irrational number; for interesting historical and sociological reasons it’s probably also the most talked-about. There are a panoply of algorithmic ways of producing the digits of the decimal expansion, some straightforward and some exotic, some elegant and others counterintuitive to the point of being magic tricks.

There are other familiar faces on the super-team of well-known irrationals: \(\sqrt{2}\), \(e\) (aka “Euler’s number”), \(\phi\) (the Golden Ratio), and so on. There’s a sizable winner’s circle (often having to do with circles in one way or another), and a lot of less-familiar close calls like \(\sqrt{13}\) or \(\phi^2\) that seem to come up in calculations now and then.

There are little (but still infinite) subsets of irrationals that deserve their own special names, like trigonometric numbers, the algebraic numbers, the transcendental numbers, the periods, and more.

These bunches of irrational numbers are mainly named, and the numbers they contain are proven to be members or not, according to the ways in which they can be generated.

Of the reals¹, the rationals are that subset which can be represented as a fraction with finite integer numerator and denominator, which includes the integers of course. The proof that some particular number is rational or irrational boils down to a proof that there is or is not a pair of integers that can be the numerator and denominator of such a fraction, whether by producing those integers, or a more roundabout path.

Of the reals, the algebraic numbers are those which are solutions to polynomials (zeroes of functions with rational constants, addition, subtraction, and exponentiation with integer powers). These include the integers, and the rationals, and some irrational numbers like \(\sqrt{2}\) and so on. The proof that some particular number is algebraic or transcendental boils down to a proof that there is or is not a way of producing it by solving such a polynomial, or that it is a member of a better-known subset of the algebraic numbers, or perhaps by some more roundabout path.

As a side-effect of what we mean by standard numerical notation, there’s another heuristic we can use to easily differentiate a rational from an irrational number. In standard notation, the fractional digits—the ones to the right of the decimal point—of any rational number will end up in a periodic attractor.

This can be a trivial attractor, too. I can write the number \(17\) as \(17.0\), but this “really means” something more like “\(17.000\dots\)”, which can be read to mean “an infinite string of repeated zeroes to the right”. Whether I write \(\frac{1}{3}\) as \(0.333\dots\) or \(0.\overline{3}\), or \(\frac{1}{20}\) as “\(0.05000\dots\)” or \(0.05\overline{0}\), or \(\frac{1}{7}\) as \(0.\overline{142857}\), in each case we reach a permanent cycle of digits at the right “end” of the fractional part, for all rational numbers.

For irrational numbers, this doesn’t happen. We cannot write \(\pi\) or \(e\) or \(\phi\) as a decimal value that has a repeating sequence of any finite length to the right of the decimal.

Non-periodic rewriting systems

For some of the most computery people here, the Prouhet-Thue-Morse constant—or possibly the Thue-Morse sequence that produces it—may also be familiar.

I first came across it decades ago in a book about Lindenmayer Systems, so let’s start there.

Suppose I have a string, I’ll call it the “seed”, and in this case let’s say it contains a single digit "0".

I also have a set containing some substitution rules, which I will apply to every matching pattern in a string, synchronously. For the Thue-Morse L-system, here are those rules, where I’ve shown the left-hand “match string” in the syntax of a familiar regular expression, and the right-hand side as the specific string that replaces the matched pattern(s):

a: /0/ -> "01"
b: /1/ -> "10"

To make things clearer later on, I’ve given the rules unique names. In order to indicate where substitutions are happening I’ll break the application of the rules into two phases, one for pattern-matching where I show where the left-hand sides have been recognized by each rule, and another by rewriting where we substitute in the right-hand sides.

step  string
 0    0
      a
 1    01
      ab
 2    0110
      abba
 3    01101001
      abbabaab
 4    0110100110010110
      abbabaabbaababba
 5    01101001100101101001011001101001
 ...

In each step, I’ve found every matching pattern from a rule—and in this example, every character in the string is matched by some pattern, though that will not always be true—and replaced it with the name of the rule that “caught” it. And then I’ve replaced each rule name with the right-hand side of the rule. The two-phase steps “grow” a string, which you can already tell will be infinitely long at some point.

That’s the Thue-Morse sequence. By a quirk of the rules’ design, it looks as though it’s only “growing” at the right-hand end, but in fact each character is doubling in each step. Here I’ve hidden the little intermediate steps to highlight that:

step  string
  0
  01
  0110
  01101001
  0110100110010110
  01101001100101101001011001101001
 ...

But be aware that what’s “really” happening should look more like this in your head:

step  string
  0
  0               1
  0       1       1       0
  0   1   1   0   1   0   0   1
  0 1 1 0 1 0 0 1 1 0 0 1 0 1 1 0
  01101001100101101001011001101001
 ...

In any case, what we get is a “growing” sequence of characters. If we take those characters as the digits of a (binary) fractional number in base 2 notation, we get a series of approximations that gradually approach of the Prouhet-Thue-Morse constant:

step  decimal                 binary
  0.0                     0.0
  0.25                    0.01
  0.375                   0.0110
  0.41015625              0.01101001
  0.412445068359375       0.0110100110010110
  0.41245403350330889225  0.01101001100101101001011001101001
 ...  ...                     ...

This constant is irrational, and also transcendental, and it’s also interesting for a variety of reasons that don’t concern us here. What I want to do here is focus on that algorithmic process—the L-system one.

More substitution rules

I’ve written before about this kind of substitution system, but let me present a few more variants on the simple contextless rules I’ve been using so far.

The match patterns can have wildcards, just like [most] regular expression matchers:

a: /0(.)0/ -> "010"
b: /0+/ -> "9"
c: /0[1-4]/ -> "05"

Rule a replaces any single character that appears between two (adjacent) zeroes with a "1".

Rule b replaces any string of one or more consecutive zeroes with a single "9".

Rule c replaces any of the patterns in {01,02,03,04} with the specific string "05".

We can grow or shrink the patterns we match:

a: /0123/ -> "1"
b: /456/ -> "445566"
c: /7/ -> ""

Rule a replaces an occurrence of the exact string 0123 with a single "1".

Rule b replaces the exact string 456 with "445566".

Rule c deletes the character 7.

We can “capture” parts of the matched pattern:

a: /0(.)(\1)/ -> "1\1\1"
b: /4(5+)6/ -> "\146"
c: /(.)(.)\2\1/ -> "\2\1\2\2\1\2"

Rule a matches any string of three characters that starts with a zero, followed by two copies of the same character, and replaces the zero with a "1" (leaving the following digits unchanged).

Rule b matches a string of one or more 5 digits, preceded by a 4 and followed by a 6, and moves the 4 to the indicated position just after the string of fives.

Rule c matches a palindromic sequence of the form abba and replaces it with the same characters in the sequence babbab. So for example 8778 would be replaced by 787787, and so on.

You may already be playing with these rules, and might have noticed that there is an issue with conflicting rules in the schemes I’ve laid out. What about a grammar like this one?

a: /0/ -> "01"
b: /01/ -> "11"
c: /10/ -> "010"

Given the string "010", which rule(s) do we apply, and where? Does a recognize the first zero, or does b recognize that and the following one?

And how about this one?

a: /0/ -> "01"
b: /0/ -> "11"

Here, the same exact pattern is explicitly recognized by both rules. How do we discriminate or prioritize?

There are plenty of ways to disambiguate, but the one I’ll use here is sitting right in front of our faces: We’ll use the first matching pattern in the order the rules appear in the ruleset, and when a pattern is matched it’s immediately replaced by a placeholder that removes it from further consideration. So for example in the case of

a: /0/ -> "01"
b: /01/ -> "11"
c: /10/ -> "010"

we will never apply rule b, but rule c can pop in and “use” a zero that would fall to rule a in some other context. We just need to make a more detailed “transactional” process. Here, watch:

step  string
 0    010
      a10
      ac
 1    01010
      a1010
      ac10
      acc
 2    01010010
      a1010010
      ac10010
      acc010
      acca10
      accac
 3    010100100010
...

See how that plays out? In each line I’ve consumed the first matching pattern (if any) that I find in the ruleset, and then when I’ve run through the entire string I do the substitutions themselves all at once. Using a meta-rule like that, I suppose one could handle any arbitrary list of rules. Rather than applying every one, everywhere it could be done, all at the same time, we can incrementally apply them throughout the string.

OK, so sure if the string is infinite, then there has to be an assumption that “incrementally” means “pretty much all at once, but with precedence rules”, but you get the picture.

OK so let’s fuck around with π

For example, here’s a one-rule ruleset that converts \(\pi\) into a rational result.

a: /0|[2-8]/ -> "1"

I’m using the sorta-kinda regular expression syntax here to indicate that “every digit except 1 is turned into a 1”. When I apply this rule (in parallel) to \(\pi\), what happens?

step  string
 0    3.14159265358979323846264338327950284197169399375...
      a.1a1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aa1aaaaaaaa...
 1    1.11111111111111111111111111111111111111111111111...
 ...

Well, it looks as if it turns into the decimal representation of \(\frac{10}{9}\), and then stays there for all subsequent iterations… but I’ve fudged a little here haven’t I? I’ve ignored the infinite sequence of preceding zeroes to the left of any finite real number in this representation, so I suppose to be more formally correct I should read this to mean:

step  string
 0    ...0003.14159265358979323846264338327950284197169399375...
      ...aaaa.1a1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aa1aaaaaaaa...
 1    ...1111.11111111111111111111111111111111111111111111111...
 ...

and that, I admit, isn’t nearly as satisfying to see.

I could do the same trick in a slightly different way, though, and get the desired result… though perhaps with a different kind of “fudging”:

a: /0|[2-9]/ -> ""

which will do this:

step  string
 0    ...0003.14159265358979323846264338327950284197169399375...
      ...aaaa.1a1aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1aa1aaaaaaaa...
 1    ...0000.1111...

If I stop there (after one step) and as long as I add a weird-sounding heuristic like “zeroes to the left of the leftmost digit of the integer part are replenished as needed”, it looks like I’ve “converted” \(\pi\) into the decimal representation of \(\frac{1}{9}\).

Making the assumption that \(\pi\) is a normal number, and with a little more syntax in the L-system definitions, I should be able to play similar tricks to produce other rational numbers:

a: /4/ -> "X"
b: /[0-3,5-9]/ -> ""
c: /X/ -> "142857"

Two round of this ruleset (plus the heuristic that permits left-filling zeroes to appear) show us that I’ve converted \(\pi\) into \(\frac{1}{7}\):

step  string
 0    ...0003.14159265358979323846264338327950284197169399375...
      ...bbbb.babbbbbbbbbbbbbbbbabbbabbbbbbbbbbbabbbbbbbbbbbb...
 1       ...0.XXXX...
         ...b.cccc...
 2       ...0.142857142857142857142857...

Arguably the “fill to the left with zeroes” heuristic has a companion “fill to the right of the last non-zero digit with zeroes”. We can convert \(\pi\) (and if you think about it, every other real number) into zero with a similar approach:

a: /0-9/ -> ""

This rule immediate converts any decimal number to a lonely little decimal point, which gets back-filled in both directions to be ...000.000....

What a boring function, huh? And yet, it might be useful in some circumstances.

Here’s an interesting one:

a: /0/ -> "0"
b: /1/ -> "0"
c: /2/ -> "0"
d: /3/ -> "0"
e: /4/ -> "0"
f: /5/ -> "1"
g: /6/ -> "1"
h: /7/ -> "1"
i: /8/ -> "1"
j: /9/ -> "1"

step  string
 0    ...0003.14159265358979323846264338327950284197169399375...
      ...0000.00011011011111000101010001001110010011011011000...

I’m going to suggest this is probably an irrational number, as long as there is no point in the decimal expansion of \(\pi\) where the digits irreversibly start to be 4 or lower.

Then there are the “echoes” of \(\pi\), like

a: /0/ -> "00"
b: /1/ -> "11"
c: /2/ -> "22"
d: /3/ -> "33"
e: /4/ -> "44"
f: /5/ -> "55"
g: /6/ -> "66"
h: /7/ -> "77"
i: /8/ -> "88"
j: /9/ -> "99"

step  string
 0    ...0003.141592653589793...
      ...aaad.bebfjcgfdfijhjd...
 1    ...00033.114411559922665533558899779933...

and so on…

Indeed, there seems to be a lot of leeway here. It feels as though we can convert an irrational number like \(\pi\) into a rational one by certain well-planned rulesets, though it may not be simple or even possible in the general case. But it’s certainly true that we can substitute globally in the infinite sequence of digits and produce a variety of other derived irrational numbers, as well. And while we could keep re-applying a ruleset forever, we don’t seem to need to do so in order to change from irrational to rational values.

Maybe the other way we do, as in the case of the Thue-Morse sequence. There, we’re building a series of rational numbers that gradually approximate the real constant in the limit.

Those difficult problems I threatened

So here are some open questions:

Is there a finite set of rules, applied a finite number of times (but possibly more than once) which can transform \(\pi\) into \(e\)? That is, can we make the decimal string "3.14159..." into "2.71828...", with absolute precision?

That seems so unlikely that it’s intriguing, if you ask me. I mean… no, surely not.

Well, OK, how about this one:

Is there a finite set of rules, applied an infinite number of times, which can transform \(\pi\) into \(e\)? That is, can we make the decimal string "3.14159..." into "2.71828...", with absolute precision, as the limit of an iterative global replacement process like the one used to produce the Thue-Morse sequence?

Hmm…. Maybe? That’s an odd one.

But one more:

Is there a infinite set of rules, each with a finite match pattern and substitution string, which can be applied a finite number of times, which can transform \(\pi\) into \(e\)?

You may notice that I haven’t mentioned a finite set of rules with infinite match patterns… because hey that’s trivial! Consider this super-dooper cheaty one:

a: /3.1459.../ -> "2.71828..."

That’s trivial. One wonders if there are less… heavy-handed variations though.

There is a different facet of these substitution-functions. What does one look like, applied to (for instance) the domain of the reals?

For example, take the “echoes” rule I mentioned above:

a: /0/ -> "00"
b: /1/ -> "11"
c: /2/ -> "22"
d: /3/ -> "33"
e: /4/ -> "44"
f: /5/ -> "55"
g: /6/ -> "66"
h: /7/ -> "77"
i: /8/ -> "88"
j: /9/ -> "99"

What does that function “look like” when plotted for all \(x\)?

It’s commonly understood that \(\pi\) is a normal number. That is, all combinations of digits appear somewhere in the expansion.

The first and most accurate approximation of \(\pi\) appears in the decimal expansion of the number, starting from the first position. That is, "3.14159..." is the number, and so it’s the best possible approximation.

But we can assume that somewhere later on there is a sequence 3141, and somewhere a sequence 314159, and so on. How many contiguous digits of \(\pi\) might appear in \(\pi\) itself?

Considering every contiguous substring of the decimal expansion of \(\pi\), we can measure its “latency” (distance from the start of the fractional part) and its “error” (absolute difference from the actual value of \(\frac{\pi}{10}\)).

So for example, here is a little table of a few four-digit substrings:

3.141592653589793...   latency  error = |(π/10)-(i/10000)|
  1415                 1        0.172659...
   4159                2        0.101741...
    1592               3        0.154959...
     5926              4        0.278441...
      9265             5        0.612341...
       2653            6        0.048859...
        6535           7        0.339341...
         5358          8        0.221641...
          3589         9        0.044741...
           5897        10       0.275541...
            8979       11       0.583741...
             9793      12       0.665141...
              ....

You can see that among these first few four-digit substrings, 4159, 2653 and 3589 seem to be the mutually-nondominated ones. That is, no other in this brief list is both closer to the start, and numerically closer to 31415....

What substrings of the fractional part of \(\pi\) (in base 10 notation, for now) are mutually non-dominated with regards to latency and error? Notice that here I’m only talking about the fractional part, to the right of the decimal point, and so what I mean more specifically is the absolute distance between \(\frac{\pi}{10}\) and the ratio of the integer value of a given substring of \(k\) consecutive digits, and \(10^k\). So for example, the substring 5358, with four digits, produces an “estimate” of \(\frac{5358}{10000}\) or \(0.5358\), which is then compared to \(0.314159\dots\) to produce the absolute error measure.

This provokes a glib follow-on question or two:

What is the largest sequence of digits that reproduces \(\frac{\pi}{10}\) which appear in the fractional part of \(\pi\) itself?

And also:

How much of \(\pi\) you find in \(e\)? In \(\phi\)?

Everything I’m writing here applies to the more general collection of complex numbers as well; I’m simplifying (probably unnecessarily) to get to a point. ↩

Notes on rewriting real numbers

Non-periodic rewriting systems

More substitution rules

OK so let’s fuck around with π

Those difficult problems I threatened

Related and probably worth mentioning