Étude: How much do you need to know?

Draft of 2017.06.16

May include: puzzle ↘ mathematics ↗ information theory ↖ &c.

Let’s work through this as a series of increasingly challenging questions. As always (I have the sense I’m repeating myself, but one never knows who will stop by and when), these are presented as open questions for exploration, not “homework” style puzzles with known best answers. I may not even know the “answers” myself. Assume when I ask something, I literally want to know what you think.

A block of numbers

Suppose I arrange the numbers $\{1,2,3,4,5,6,7,8,9\}$ in a three-by-three square. In some order, which I don’t tell you. I hide it.

Here’s such an arrangement, as an example:

3 1
5 4
6 7

There are nine integers there, all distinct. That means there are $(9 \times 8)/2=36$ pairs of integers.

Suppose now that I tell you, for each one of those 36 pairs of integers, the Manhattan distance between them. For the example above, this would be something like:

2 => 3   
3 => 1   
4 => 1   
5 => 2   
6 => 3   
7 => 2
8 => 2
9 => 4
3 => 2
4 => 2
5 => 1
6 => 2
7 => 3
8 => 1
9 => 1
4 => 2
5 => 1
6 => 2
7 => 3
8 => 1
9 => 3
5 => 1
6 => 2
7 => 1
8 => 3
9 => 3
6 => 1
7 => 2
8 => 2
9 => 2
7 => 1
8 => 3
9 => 1
8 => 4
9 => 2
9 => 2

In that list I’ve given you all those pairs of values in the square, plus the total number of up/down and left/right steps between them. Given that I’ve also told you that the numbers are arranged in a square (we’ll come back to this), it seems to me that you have enough information to place the numbers into a correct solution.

Do you agree?

Before you jump to qualify, let me point out that by “correct solution”, I mean that you are able to place all nine values into a square such that the numbers you place produce the same distances I’ve listed. That is, the 36 constraints I’ve provided will all be satisfied by your numbers; they suffice to find some solution. I know there are certain symmetries that could make your particular square differ from mine in terms of the absolute positions. For example, it might be reflected, or rotated. That is, these squares are also solutions:

2 8     8 2 9     1 3 8
5 3     3 5 6     4 5 2
4 1     1 4 7     7 6 8

There are other solutions as well. How many?

How much do you need to know?

Now for the fun part: Let’s explore what happens when I don’t give you all 36 “clues”.

Suppose I only give you 35 of them. Sticking with the example above for now, it seems to me that you will still have enough information to build a solution to the complete square, even if you’re lacking just one of the constraints.

I confess I haven’t checked, though. Is it true that you can still always build a solution that will always satisfy all 36 constraints, even if one of those constraints is hidden from you?

What about if I hide two of the constraints?

It seems to me that for some number of hidden constraints, you will suddenly lack enough information to solve all the constraints. That may also depend on which particular constraints I hide, don’t you think?

Maybe it’s productive to approach it from the other direction. If I give you only one constraint, say 4:6 => 2, what can you say about the position of the numbers in the square?

Well, given a distance of 2 steps, those values are either on opposite edges of the square, or diagonally adjacent to one another. In other words, we can produce the following solutions:

4 ? 6    ? ? ?    ? ? ?    6 ? 4
? ? ?    4 ? 6    ? ? ?    ? ? ?   ...
? ? ?    ? ? ?    4 ? 6    ? ? ?

4 ? ?    ? 4 ?    ? ? 4    6 ? ?
? ? ?    ? ? ?    ? ? ?    ? ? ?   ...
6 ? ?    ? 6 ?    ? ? 6    4 ? ?

4 ? ?    ? 4 ?    ? 4 ?    ? ? 4
? 6 ?    ? ? 6    6 ? ?    ? 6 ?   ...
? ? ?    ? ? ?    ? ? ?    ? ? ?

... and so on

For each of these templates, we can fill in the spaces marked with “?” using any arrangement of the remaining 7 values, or 5040 different ways. And there are (it seems to me) six templates where 4 and 6 are on the same row, six templates where they’re in the same column, eight templates where one is down and to the right of the other, and eight where one is down and to the left of the other. That’s something like $(28 \times 5040) = 141120$ different “solutions”, given only the constraint 4:6 => 2.

If I’ve counted right.

But I notice it also seems to depend which constraint I give you. For example, if I’d given you 7:8 => 4, there would only be four “templates” (where the numbers are in opposite corners), and therefore only $(4 \times 5040) = 20160$ “solutions”.

In other words, 7:8 => 4 somehow feels more “informative” than 4:6 => 2. It collapses the set of possible solutions “faster”, getting us closer to a solution that will satisfy all 36 constraints, even the ones I’ve hidden from you.

So, having shown that at least in some circumstances, some constraints provide different “clues”, let’s go back to the case where I’ve only hidden one of the 36 constraints.

Is it always the case that you can solve the 36-constraint problem with only 35 constraints?

What is the minimum number of constraints I need to hide from you, for you to be unable to solve the 36-constraint problem with certainty, assuming I am as ruthless as possible when I select the ones to hide?

What is the minimum number of constraints you need to be given, assuming you can pick the most informative ones that suffice to solve the 36-constraint problem?

Call those specific constraints—the ones you need to be told, if they are the best possible subset, so that they are the smallest possible subset you need to solve the complete set of constraints—the “best set”. If I randomly give you a set of constraints equal in size to the best set, what’s the probability you will have enough information to solve the 36-constraint problem?

Widening the problem

What happens if we make the square bigger?

What happens if I tell you that the numbers are arranged in a rectangle, but don’t tell you the dimensions of that rectangle?

What happens if I tell you that the numbers are arranged in a lattice, such that every number is adjacent on one edge to at least one other number, but that they don’t have to form a rectangular shape? That is, they form a polyomino, like this one for example (where I’ve placed a “.” in unoccupied squares):

1 2 . . .
. 3 4 5 .
9 6 . 7 8

It seems to me that this last case opens a can of worms for us. Is it even still true that, given all 36 pairwise Manhattan distances, I can determine the shape of the polyomino?

Chris Freeman asks about this last one: “In the widened/polyomino example, is the Manhattan distance for 1:9 => 2 or is it 4?”

Good question!

I originally framed the polyomino version on the assumption that the Manhattan distances would be calculated on the abstract grid itself. Which is to say, 1:9 => 2 as I was thinking it.

More completely, for this polyomino

1 2 . . .
. 3 4 5 .
9 6 . 7 8

Here are some Manhattan distances, the way I originally assumed they’d be measured. Only a few will change based on the approach, though, so I’ve pointed those out:

2 => 1   
3 => 2   
4 => 3   
5 => 4   
6 => 3   
7 => 5
8 => 6
9 => 2   <= this one is tricky
3 => 1
4 => 2
5 => 3
6 => 2
7 => 4
8 => 5
9 => 3
(and so forth)
7 => 2   <= here's another one that gives it away

In other words, the Manhattan distance can “jump” across concave parts of the polyomino. It doesn’t have to count along the “filled-in” parts of the lattice.

This brings up a followup question, though: Suppose I had said that the distance was defined only along the backbone of the shape, where there are numbers on the lattice. Assuming I told you whether the distances were measured in one way or the other—along the lattice, occupied or empty, or along the shortest occupied path connecting two numbers—would it make any difference in the results?

I honestly don’t know. A (possibly) simpler question would be: for what polyominoes, of what size (if any) might it matter?

Things I’ve noticed so far

Let’s talk about the simplest and/or first questions, regarding squares of numbers.

First of all, it’s interesting to note that the squares smaller than nine numbers are remarkably boring. The one with one number is about as boring as you can get, since I can tell you with 100% certainty where every number lies within it.

The one with four numbers in a 2x2 block is almost as boring, since there are really only two variations: one where 1 and 2 are next to each other, and one where 1 and 2 are not next to each other. Taking into account the reflections and rotations that are possible, that means (it seems to me) that as long as I know any one constraint, I can deduce the relative positions of all the numbers in the square.

It’s interesting (and possibly a bit foreshadowy) that the nine values in the 3x3 square aren’t quite as determined as the smaller squares. It’s almost as though the number of possible pairwise distances $(N \times (N-1)\div 2)$ u may not be increasing quite as fast as the number of possible arrangements of numbers in a square. Might it be that there isn’t enough information in a complete list of all $(25 \times 24)/2 = 300$ pairwise distances in a 5x5 square to fully determine where everything is? What about the $(100\times 99)/2 = 4950$ pairwise distances for a 10x10 square?

That seems like a lot of redundant information. But it doesn’t feel like quite as much information as soon as I start relaxing the constraint that the numbers lie in a square.