Some Continued Fractions benchmarks for machine learning

Draft of 2017.12.19

May include: GP ↘ benchmarks ↗ &c.

One of the subjectively interesting aspects of working in the field of genetic programming—and software synthesis more generally—is this constant introspective musing sense one develops. You’re always wondering whether simple things you encounter here and there would still be “simple” for your artificial system-that-writes-code. If you can do a homework problem in a book you pick up off a shelf, can your made thing do that same problem? Because of course the long-term goal in these fields is to ask your made thing to do problems you can’t do….

For me, this feeling is evoked a lot by number theory and relatively simple data structures. I don’t have any formal mathematics or computer science background, so I was never given the more onerous homework problems. So to me, an older human, they’re kind of “fun”.

I’ve often thought that anybody wanting a sense of confidence in the creative and problem-solving capacity of their genetic programming system should pick up a volume of Knuth or (as Tom Helmuth has done in his work) an introductory programming textbook, open it to a random page, and make your GP system do the exercises you find there. Most GP systems I encounter are too simple-minded to do much more than arithmetic or symbolic modeling of data, and while that’s a vital and impressive improvement over neural networks’ matrix stuff, it’s still not inventing code.

If you want your thing to surprise you, then it will need access to a sufficiently diverse toolkit of basic concepts and structures, of the sort human people beings use. But you also quickly learn to avoid simply dropping in All the Libraries™, because that can end up shifting an interesting general-purpose problem-solving puzzle into something more like a sorting-your-toolbox-and-try-all-the-stuff problem. There’s a happy medium somewhere between the impossibility of an artificial system that just gives you the right answer immediately and with no explanation, and the trivial artificial system that asks, “Is it AB? Is it BB? Is it CB? Is it DB…” and so on.

So when we build one of these things, we start with simple stuff, and work our way up from there. I like puzzles and mathematical recreations, personally, and it happens that a lot of puzzles and mathematical recreations involve discrete math, number theory, and some simple geometry. Which is why the GP system I’ve been developing over the last few years has a lot of discrete math, number theory and simple geometry “tools” in it.

So I’ve included support for vectors (or ordered collections) because there is a large and interesting class of numerical problems which seem—naturally, or at least historically—to “want to be done using tuples”. And of course if your system has tuples of numbers, and also individual scalar numbers, and you want it to be able to shift representations back and forth between those two domains relatively easily, then you need to afford the system some easy paths to do so. It can’t just sit there looking at a stack of loose numbers with a vector in its hands, asking itself, “How do I make one of these out of those?” There are subtle relations possible between the individual scalar values and the order in a vector, which we take for granted when we’re doing mathematical work but which an artificial thing doesn’t have access to.

That’s interesting. That’s when GP realizes its potential for illuminating our own internal models of the world.

The tasks I’m going to discuss today are exactly aimed at that crack in our heads: What do we need to provide, in order that we can interconnect what would otherwise be a general-purpose pile of loose and arbitrary-seeming numbers and functions? How do we help something that looks like a neural network become a system that better “understands” which numbers and functions might be suitable in a given context?

Generalized continued fractions

Continued fractions are fun. Partly because they use only simple arithmetic, partly because they have that just-off-the-beaten-path feeling of never being treated in standard mathematics curricula, partly because they tend to show up in old-fashionedy puzzle books and mathematics theorems. When people write about continued fractions, they often do so in the context of unusual Egyptian number systems, and drop big names like Euclid and Gauss. They’re stuff you can play with using a calculator and pencil-and-paper notation, but they’re also core to some very fancy complex analysis topics.

I want to talk about non-standardized continued fractions, and I’ll imply that what I mean are these generalized continued fractions. I don’t want to mention complex numbers in this piece, at least not until the end, so assume when I’m talking about these things that all the constants in hand are real numbers. For now.

I’ll represent a continued fraction as an ordered collection of real constants, like \([b_0; a_1, b_1, a_2, b_2, a_3, b_3,\dots, a_n, b_n]\), which I’ve named following the (pretty typical) Wikipedian nomenclature convention. These are placed the following fixed recursive arithmetic structure to calculate some specific constant real value \(c\):

\[c = b_0 + \cfrac{a_1}{b_1 + \cfrac{a_2}{b_2 + \cfrac{a_3}{b_3 + \dots}}}\]

So for example, \([1;2,3]\) represents the value \((1+\frac{2}{3})\), and \([1;2,3,4,5]\) would therefore expand to be \(1+(2/(3+4/5))\).

You can do a lot of interesting things with these numbers, and they’re a core representational tool in Real and Complex Analysis applications. As you can probably imagine, if the constants are integers (or even rational numbers), and there are a finite number of them, then when we work out the value of \(c\) we’ll find it’s a finite rational number. If we use infinite sequences of constants—whether they’re periodic or not—then the calculation of \(c\) as we increase the subset of terms used (the convergents of \(c\)) represents increasingly precise recursive approximations of irrational values of \(c\).

Most famous of these irrational numbers is \(\pi\). Several years back, Evelyn Lamb wrote a brief but lovely appreciation of \(\pi\) in its continued fractional representation. Another well-known value is that of the Golden Ratio, \(\phi\), which has the lovely continued fraction representation \([1;1,1,1,1,1,1,1,1,1,1,1,\dots]\), or

\[c = 1 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{1 + \dots}}}\]

Some of the first few convergents of \(\phi\) (in this form) would be:

\[[1] = 1\] \[[1;1,1] = 2\] \[[1;1,1,1,1] = \frac{3}{2}\] \[[1;1,1,1,1,1,1] = \frac{5}{3}\] \[[1;1,1,1,1,1,1,1,1] = \frac{8}{5}\] \[[1;1,1,1,1,1,1,1,1,1,1] = \frac{13}{8}\]

At which point you should start saying whaaat? unless you already expected to see the Fibonacci sequence there. But that’s for another day.

Beyond these old familiar irrational numbers like \(\pi\) and \(\phi\) and \(e\) there’s some extraordinary number theory we could pause to consider, relating continued fractions that terminate, infinite ones that are periodic, and infinite ones that are not periodic. But that’s also for another day.

Instead let’s talk about the inverse problem for these representations. That is, if I give you a real value \(c\), how do you (a human) work out a series of constants of the form \([b_0;a_1,b_1,\dots]\) that represent that same value \(c\) (possibly in the limit)? There are relatively simple algorithms known. For instance, the Euclidean algorithm gives you a series of constants with certain standard characteristics.

But there are certainly other algorithms besides that one in the space of algorithms for this inverse problem. Keep in mind that there are many different representations for any given value \(c\). The canonical standard form sets all \(a_i\) constants to \(1\), and if possible assigns \(b_i\) constants to be positive integers. But we can also allow any of these constants to be negative integers if we like, or rational numbers, or irrational numbers in their own right, or complex numbers of any of those flavors….

We can, in other words, play with the numerators and denominators in this framework more or less arbitrarily.

Try it out to warm up: How many ways can you find to represent \(c = \frac{7}{5}\) using only positive integer constants? What about with positive and negative rational constants? Is there a way to do it with only negative constants? Is there a way to do it with exactly ten non-zero constants, rather than three or four you might initially have picked? Are there ways to make finite continued fraction representation of a rational value longer, and still have it represent the same value?

On the in-between edge cases

In this series of benchmark problems for genetic programming systems, we’ll be working only with finite generalized continued fractions, and for simplicity only real-valued ones.

You might have noticed that all the examples I’ve given so far include an odd number of constants in the expanded continued fraction forms. Let me specify what happens in other cases.

First, let’s assume that \(b_0\) is \(0\) by default.

Also, that any missing \(b_i\) value, for \(i > 0\) is \(1\).

Finally, let’s say that any missing \(a_i\) value should be assumed to be \(0\).

Let me work out a different series of convergents (which with these default value assignments may no longer be actual “convergents” in the rigorous sense of the term), for an infinite series like \([1;2,3,4,5,6,\dots]\):

\[[1] = 1\]
\[[1;2] = 3\]
\[[1;2,3] = \frac{5}{3}\]
\[[1;2,3,4] = \frac{9}{7}\]
\[[1;2,3,4,5] = \frac{29}{19}\]
\[[1;2,3,4,5,6] = \frac{59}{37}\]
\[[1;2,3,4,5,6,7] = \frac{233}{151}\]

I haven’t bothered to show the “default” values for “missing” terms here, but you can insert them into any of these continued fraction vectors (being careful to keep your \(a\) and \(b\) assignments straight), and get the same results. So where I’ve written \([1;2,3,4]\) for example, I could have written \([1;2, 3, 4, 1, 0, 1, 0, 1, 0, 1]\) and gotten the same value for \(c\).

(Which answers one of the questions I asked you above.)

A few GP benchmarks

Let’s see if we can work through a series of genetic programming benchmark problems in (rough) order of increasing difficulty.

Now “difficulty” here can be taken to mean the difficulty in representing the path from a training case to a good (even if approximate) answer—in other words, the amount of “stuff” that might need to be present in your GP system’s toolkit to even be able to “conceptualize” an algorithm at all. But “difficulty” in a genetic programming context is more often read as something like “propensity for my system to find a solution in a finite number of evaluations”, and I can’t address that because I’m not privy to your GP search algorithms.

I think, in other words, that I mean “amount of stuff if feels like a system would need to know to even begin to address these tasks”.

For instance, can your system even handle ordered collections of numerical values? Because for many of these problems, I mean to suggest input values like \([8;3,1,-2,0,9,1,7,-4]\) in one case, and \([77;2,3]\) in another. If those vector-like collections can’t be passed in directly, you’ll need to consider either adding the abstraction of “vectors”, or extending them to fit a fixed size (by using the default constant values I mentioned above).

Or you may have noticed that I showed you exact rational results for the convergents I’ve showed so far. You may want to be concerned whether rational values are the way to go here, or whether it would be simpler or more effective to use floating-point values. And if you use floating-point values, then consider what degree of precision will be needed to detect a difference between (for example) a 31-term continued fraction and the 30-term shorter convergent. If the values are taken from an irrational number, you might need to wonder whether 32-bit or 64-bit or maybe some arbitrary-precision representation is called for.

As I’ll say again at the end, the point of these “benchmarks” is the exploration of the ways in which your GP system undertakes the search, not the final results. Don’t treat them as a checklist of tasks to solve; rather think of them as a series of increasingly challenging “psychological” puzzles you’re handing your supposed artificial intelligence. Watch what partial solutions arise, and observe closely enough to know whether things are “stuck” or some new “insight” may still come along.

Which fraction is bigger?

As a warmup exercise: Given four positive integers (which should in at least some cases be very large), \(a\), \(b\), \(c\) and \(d\), determine whether

\[\frac{a}{b} < \frac{c}{d}\]

Watch for simple heuristics here as a solution develops. Again, we’re not looking to see the system learn an algorithm to find a common denominator. Rather do this one to see which heuristics and approximations crop up along the path to that discovery.

To avoid simply “discovering division”, make sure that some of the training cases are large-enough integers that the floating-point differences in their ratios disappear.

Rediscover the “continued fraction formula”

Given some vector of values \([b_0,a_1,b_1,a_2,b_2,\dots]\), return the \(c\) value this represents.

If your system can’t handle arbitrary-length vector items, then fill with \(b_i = 1\) and \(a_i = 0\) as needed. But in the (more interesting) case that your system does handle arbitrary-length numerical vectors, watch for the “discovery” of these default values themselves.

Does the system find an approximation that uses the first term only? Does it “realize” that iteration or recursion is important? If your system doesn’t explicitly include iteration or recursion, how does evolutionary search achieve the necessary result?

Keep in mind that this may be the sort of thing a neural network could accomplish at least as well as a GP system. What reasons might there be for working in one representation and search scheme, as opposed to the other?

Which continued fraction is bigger?

Given two vectors in the form \([b_0,a_1,b_1,a_2,b_2,\dots]\), which one represents a bigger \(c\) value?

This seems trickier, in some sense, than either of the prior benchmarks, though it seems also to be related. For instance, I find examples like this pretty confusing, because the intuitive heuristic I’d use would involve looking at the “first few terms”:

\[c_1 = [0; 6, 4, 20, 20, 1, 9, 20, 18, 15, 2, 12] = 1.2011894508627603\] \[c_2 = [0; 14, 14, 2, 15, 5, 9, 5, 12, 4, 2, 16] = 0.9908858753082072\]

There may be something about “big numerators and small denominators” tucked in there, but it’s beyond me at the moment. This feels as though it may be the place where software synthesis of algorithms starts to do things people can’t easily do “in their heads”.

Open-ended continued fraction derivation

Given some value \(c\), produce an (accurate) \([b_0,a_1,b_1,a_2,b_2,\dots]\), using any sort of handy number in the constants.

This brings us deep into the world of iterative or recursive algorithms. If your system isn’t “comfortable” with those algorithmic concepts, then the algorithms it does develop may be interesting but opaque in their own way. For example, it seems perfectly feasible to ask a well-made neural network to produce an appropriate vector of numerical values.

What benefits, if any, would there be to using a neural approach?

Constrained continued fraction derivation

Given some value \(c\), produce an (accurate) \([b_0,a_1,b_1,a_2,b_2,\dots]\), using only positive integers as the constants.

For at least some target values, there will be no accurate representations possible in a given number of convergent terms. If your system can’t “add more”, it may find these problems more difficult.

This opens the door to multiobjective goals as well. One wants to see an accurate and parsimonious solution, right? How will you handle that?

Diverse pairs of open-ended derivations

Given a value of \(c\), return two different representations which are both accurate, and with as large as possible a summed-squared difference in the constant terms.

As it works on this task, notice how the system “approaches” it. Does it find one accurate representation and then apply a heuristic for making a second based on that one? Or does it develop an approach in which a pair is somehow improved simultaneously?

Highly constrained continued fraction derivation

Given some value \(c\), produce an (accurate) \([b_0,a_1,b_1,a_2,b_2,\dots]\) where some of the constants are already assigned, fill in the missing values.

Though I haven’t explored it in detail yet, it seems likely that there is a “tipping point” here. That is, for the totally unconstrained problem, there seem to be innumerable (infinite?) answers. As soon as we fix at least one value, then there are (possibly subtle) constraints placed on the others. If we fix more constants to arbitrary values, then at some point it seems as though these spreading constraints will begin to overlap one another.

When that happens, a partial assignment may well become unsatisfiable. For example, consider:

“find \([b_0,a_1,b_1,a_2,b_2] = 8.125\)” is feasible-seeming
“find \([b_0,a_1,b_1,4,b_2] = 8.125\)” is probably (?) feasible
…
“find \([3,-2,9,7,22] = 8.125\)” is infeasible

Somewhere between unconstrained and overconstrained there will be a tipping point (I imagine). What will your system do to determine whether a solution is feasible or not? What search process and constraint-handling infrastructure will it develop so that it can produce a correct result whenever one is feasible?

Standard form

Given \(c\), represent it in strictly standard form, with all \(a_i = 1\), and all \(b_i \in \mathbb{Z}^{+}\) (positive integers).

There’s a lot going on here. Good luck.

Standardization algorithms

Given some general form \([b_0,a_1,b_1,a_2,b_2,\dots]\) for a number, produce \([b_0',1,b_1',1,b_2',1,b_3',\dots]\) with all \(a_i\) constants set to \(1\) exactly.

This is asking too much, maybe. But then again, if your system passes through an intermediate representation, and can also convert from that representation into a standard form continued fraction….

And then what?

That should be enough for today. But let me say again what the point here should be.

These are, for the most part, tasks people have been able to do for a long time. Some of them are mentally challenging if you’re limited to pencil and paper, but realize that these are not “benchmarks” for showing the power of genetic programming or neural networks or machine learning at human-competitive tasks.

Nor are they intended to be checks to see if your GP or neural or other fancy-dancy machine learning system is “ready for primetime”. They’re difficult toy problems, but they’re still toys. Yes, you can have more confidence in the capacity of your system at problem-solving tasks by… well, you know, solving more problems. But except for being interesting mathematical exercises that people generally don’t come across unless they’re mathematicians, and aside from being preliminaries to more complicated complex-valued problems with real open research topics hanging off of them, we can’t say that an artificial system that does any of this work is in any sense “doing what Euclid did”. Because we’ve already got the tools Euclid had to develop for himself.

As Barbara said the other day, they’re computational recreations for computers to do. Use them as challenges you can pose to a system you’re building, by which you can understand what’s happening inside that system. They’re psychological probes: Does your system “realize” that the left-hand values of a vector have more influence than the right-hand values? Does it “notice” that pairs of constants \((a_i,b_i)\) form a recurring relationship with one another? Does it find convenient heuristics or intermediate representations that simplify these increasingly modular tasks?

Watch to see what happens. The artificial systems we build either will or won’t “solve” any given problem, but the point is not to attempt to get them to “win” every game. Rather, the point of these benchmarks is to surface places in which they should be asking for help, where they stumble, where we can—if we’re thoughtful enough about it—give them a voice.

Think about partial credit. Think about what you, a human teacher, should do to help a struggling student-system learn a difficult concept.

So yes, these are computational recreations for computers to do, but only under the watchful eye of a human teacher-builder. The most suitable outcome of any given test should be a novel realization in the builder, not in the system they are building.

If you miss that crucial point, then you probably shouldn’t be working in AI at all.