When do continuous spans overlap?

Draft of 2016.07.02

May include: Push ↘ GP ↗ learning in public ↖ &c.

This morning I thought I’d spend a “few minutes” to address an old issue in the Klapaucius Push interpreter. Long ago, in some other genetic programming interpreter—I no longer recall which one—I remember implementing a type which described a continuous range of scalar values. I think it was called a span, but it might also have been called a range, but now I’m working in Clojure and a range is a core function describing something rather different. I want mine to be a bit more abstract and even lazier than the Clojure function: more of a static structure, honestly.

In the last year it’s become surprisingly easy to add a new type to the Klapaucius interpreter, though as I look over it I see some places to improve and simplify even more. But that’s for another day. The basic workflow is (today):

assuming the new type requires a Clojure record to manage it, create a new file like push.type.definitions.span, where the defrecord and any associated type-specific “helper” functions will be placed, plus a recognizer predicate function that will tell me whether an arbitrary Clojure thingie is a Push :span or not
create a Midje test file in push.type.definitions.span_test, and start implementing tests for the core functionality of the type
create a file like push.type.item.span, which actually constructs the Push type itself, defines its instructions with the Klapaucius build-instruction macro and PushDSL code, adds aspect-based automatically-generated instructions if you want any, and finally invokes make-type (or make-module) to compile it all into a single record
build a Midje test file like push.instructions.base.span_test to exercise the new type’s instructions and router, and make sure an arbitrary Push interpreter does what’s expected whenever any instruction is called
actually wire the fully-tested type (or module) into the interpreter templates, especially the one I’ve been calling one-with-everything¹

Actually, now I look at this it doesn’t seem quite as “simple” as I imagined. It’s simpler than it has ever been, but I think very soon I will need to fix it quite a bit more. It should be three steps, easy to understand and explain and implement.

But not today.

Spans

The :span type I have in mind represents a continuous set of real numbers, a “region of the number line”. As you probably remember from High School math, we speak of these regions as being “open” or “closed” on each end, too, depending on whether the number at the end is strictly part of the set or is just outside the set. For example, I’m envisioning the :span \([3,4]\) includes the numbers 3 and 4 and every other number in between; \((3,4]\) does not include the number 3, just everything between it and 4 (and also 4); \((3,4)\) includes all the numbers between 3 and 4, but neither of those.

Unlike the mathematical convention—and mainly because I know random programs and evolutionary search are going to do this even if I don’t want them to—I’m willing to permit orientation here as well. That is, \([3,4]\) is not equal to \([4,3]\) in my scheme, but they do coincide with one another, in the sense they contain the exact same set of numbers. This doesn’t imply anything about the mathematics; it’s honestly just an arbitrary design decision I’ve made to see what happens (if anything).

I figure I’ll spend some time building some “simple infrastructure” (famous last words) in push.type.definitions.span, because I know/remember a few things I want this type to do. Most of the simplest stuff comes “for free” when I add a Push aspect like equatable or moveable later on, so I can focus on the type-specific behavior here. What I have in mind are mainly relations between :span items and :scalar values:

does the :span contain the :scalar?
does the :span overlap another :span?
does the :span contain another :span?

Those seem pretty straightforward, but (foreshadowing!) I am starting to guess that with orientations and open vs closed ends, there might be some edge cases to handle.

Literal edge cases

Getting started is pretty easy, although I think for a bit about some possible architectures for the basic Clojure Span record, to handle open vs closed ends. Obviously every Span will have a :start and :end, because duh. But either or both of those could be “open” or “closed”, and one wonders how to handle that. One “simple” way is to add two new fields to the record that specify each end’s openness, something like :start-open? and :end-open?. Another way might involve making a variety of different Span “subclasses”, like Span and HalfSpan and OpenSpan, but that idea makes me back away. Another other way might involve leaving Span with a single :start and :end, and then pushing “open” vs “closed” somehow down into the values of :start and :end. This last one is interesting, but odd; I wonder if there is such a thing as an “open number” vs “closed number”, because at least up to the point where this occurred to me, I only thought numbers were numbers.

While the last one feels philosophically interesting, I’m actually not here to do much philosophy. The first one feels like the simplest way to proceed, so I add

(defrecord Span [start end start-open? end-open?])

This is just raw Clojure, and to be honest I never like using the “free” constructor ->Span outright, so I add a simple wrapper as well, like this:

(defn make-span
  "Takes `:start` and `:end` scalar numeric arguments
  and by default returns a closed `Span` record.
  Optionally takes `:start-open?` and `:end-open?` 
  boolean keyword arguments, which can create a 
  partially or fully open span."
  [start end & {:keys [start-open? end-open?] 
                :or {start-open? false end-open? false}}]
  (->Span start end start-open? end-open?)
  )

I test that a bit while I’m writing it and for the most part beforehand, but I’m recording this narrative after the fact so let me show you how the tests end up. Just to give a sense of how Midje testing works, in case you’re landing here out of order:

(ns push.type.definitions.span_test
  (:use midje.sweet)
  (:use [push.type.definitions.span]))


(fact "make-span returns a Span record with closed bounds"
  (make-span 9 6) => (->Span 9 6 false false)
  (make-span 1 1) => (->Span 1 1 false false))


(fact "make-span can take optional keyworded open bounds arguments"
  (make-span 9 6 :start-open? true) => (->Span 9 6 true false)
  (make-span 1 1 :end-open? true) => (->Span 1 1 false true)
  (make-span 2 3 :start-open? true :end-open? true) => (->Span 2 3 true true))


(fact "Span getters work as expected"
  (:start (make-span 3 4)) => 3
  (:end (make-span 3 4)) => 4
  (:start-open? (make-span 3 4)) => false
  )

Like I said, I’ve written most of these tests before I wrote code to make them pass, using the errors and failures produced by Marick’s excellent Midje framework to direct me what is needed after each new test is added.

The question of whether a number falls “within” a :span seems like a good starting point after this infrastructure is done. Here are some tests I end up with:

(fact "span-include? works for closed spans"
  (span-include? (make-span 3 4) 3.5) => true
  (span-include? (make-span 3 4) 3) => true
  (span-include? (make-span 3 4) 4) => true
  (span-include? (make-span 3 4) 2) => false
  (span-include? (make-span 3 4) 7) => false

  (span-include? (make-span 4 3) 3.5) => true
  (span-include? (make-span 4 3) 3) => true
  (span-include? (make-span 4 3) 4) => true
  (span-include? (make-span 4 3) 2) => false
  (span-include? (make-span 4 3) 7) => false
  )

And the code I need to make those pass:

(defn span-include?
  "Takes a Span record and a scalar, and returns
  true if the scalar falls strictly within the Span"
  [span n]
  (let [s   (:start span)
        e   (:end span)]
    (not= (compare s n) (compare e n))))

I’m pretty pleased with that compare sitting there. The Clojure compare function is one of my favorite algorithmic gimmicks, it turns out. For numeric arguments² it returns -1 if the first argument is strictly less than the second, 1 if it’s strictly larger, and 0 if they’re equal. Just sketching a bit, I realize that if the number is outside the span on the left or right, it will be (respectively) less than or greater than both ends of the :span, regardless of the orientation of that :span. If it “lands on” either end, it’ll be equal and not-equal….

Oh hang on.

Right, so a :span can have identical :start and :end. Oh, and also those can be open. So if the number falls on an endpoint, whether it’s “inside” the :span will depend whether that point is open or not, and also whether the span is zero-length.

Time for some more tests! I realize as I’m writing the first test how annoying and wordy it is to write (make-span x y :start-open? true :end-open? true), so I throw in a new function I haven’t even written yet, called make-open-span, which I assume will just by default set both ends to “open” status.

(fact "span-include? works for open spans"
  (span-include? (make-open-span 3 4) 3.5) => true
  (span-include? (make-open-span 3 4) 3) => false
  (span-include? (make-open-span 3 4) 4) => false
  (span-include? (make-open-span 3 4) 2) => false
  (span-include? (make-open-span 3 4) 7) => false

  (span-include? (make-open-span 4 3) 3.5) => true
  (span-include? (make-open-span 4 3) 3) => false
  (span-include? (make-open-span 4 3) 4) => false
  (span-include? (make-open-span 4 3) 2) => false
  (span-include? (make-open-span 4 3) 7) => false
  )


(fact "span-include? works for empty spans"
  (span-include? (make-open-span 3 3) 3) => false
  (span-include? (make-open-span 3 3) 4) => false
  (span-include? (make-open-span 3 3) 2) => false
  )


(fact "span-include? works for partially open spans"
  (span-include? (make-span 3 4 :end-open? true) 3) => true
  (span-include? (make-span 3 4 :end-open? true) 4) => false
  )


(fact "span-include? works for point spans"
  (span-include? (make-span 3 3) 3) => true
  (span-include? (make-span 3 3) 4) => false
  )

I also realize as I’m writing some of these that I’ve left room for there to be such a thing as an “empty span”: one with identical :start and :end values, and either endpoint open. Well, sortof: one has to wonder what a :span “really is” where both ends are identical values, but one is open and one is closed, like \([3,3)\). I elide this issue for the moment, just by saying “we’ll burn that bridge when we get to it,” and thinking something like “openness trumps closedness”.

Anyway, there’s some wrestling after this, and a few more definitions of helper functions and predicates, and I end up with the following function, which seems to handle all these edge cases:

(defn make-open-span
  "Takes `:start` and `:end` scalar numeric arguments
  and returns a doubly-open  `Span` record."
  [start end]
  (->Span start end true true))


(defn span-empty?
  "Takes a Span and returns `true` if it has
  identical `:start` and `:end`, and either is open"
  [span]
  (and (= (:start span) (:end span))
       (or (:start-open? span) (:end-open? span))))


(defn span-include?
  "Takes a Span record and a scalar, and returns
  true if the scalar falls strictly within the Span"
  [span n]
  (let [s   (:start span)
        e   (:end span)
        so? (:start-open? span)
        eo? (:end-open? span)]
    (cond
      (and (= s n) so?) false
      (and (= e n) eo?) false
      (and (not (span-empty? span)) (= s e n)) true
      :else (not= (compare s n) (compare e n)))))

This conditional result in the span-include? gets a bit complicated. The first two lines essentially say: “Is the point exactly on the end of the :span, and is that end also open?” The last one is the same old one that worked fine for the case of a non-degenerate closed :span. The third clause was pretty tricky, and to be honest I’m not entirely comfortable with it even yet. It exists to handle the possibility that the span is degenerate (same :start and :end) but not empty, when the point lands exactly on one (and thus both) of the endpoints.

Literally edge cases!

That said, I can’t think of any more situations, for whatever combination of degeneracy, openness, emptiness, and so forth.

Overlapping spans

The actual point of the :span Push type is to handle some interesting Computational Geometry benchmarks I’ve been throwing at GP systems for the last few years. These involve complex hulls and the “shadows” of polygons projected onto points in 2d spaces… plus a whole bunch of other stuff…. But the point here is that my notional “span” can be thought of as the shadow of some geometric object in a plane, and the goal of such a notion is whether and how much that “shadow” overlaps some other object. I’ve already handled the case where the “shadow” does or doesn’t include a 1d point. Now it’s time to handle the case(s) in which one :span overlaps another.

The easiest one for me to get my head around is the case where both spans are fully closed, and non-degenerate, and either overlapping on one end or nested inside one another.

(fact "span-overlap? works as expected for closed spans"
  (span-overlap? (make-span 3 5) (make-span 4 6)) => true
  (span-overlap? (make-span 3 5) (make-span 2 4)) => true
  (span-overlap? (make-span 3 5) (make-span 17 77)) => false
  )


(fact "span-overlap? works as expected for nested spans"
  (span-overlap? (make-span 3 8) (make-span 4 6)) => true
  (span-overlap? (make-span 3 4) (make-span 2 8)) => true
  )

Makes sense, right?

I write a very simple span-overlap? function here that invokes span-include? to see whether the endpoints of one :span fall within the other. This works, but it fails immediately for the very next test I write so I won’t even bother showing it to you. I suspect you could already write it yourself, if you wanted.

But here’s the first bad thing:

(fact "span-overlap? can notice touching-but-not-overlapping open spans"
  (span-overlap? (make-open-span 3 4) (make-span 4 6)) => false
  (span-overlap? (make-span 4 3) (make-open-span 2 3)) => false
  (span-overlap? (make-span 4 3) (make-span 2 3)) => true
  )

Aha, nope, that fails badly with just “is the end point of one inside the other” testing. The reason makes me hesitate and consider my fundamental design choice for the Span record and the span-include? functions: the failures occur when the open endpoint of one span falls inside the other, because when I check with span-include? to determine whether a point is inside a span (including overlapping its endpoint), there is no concept of whether the point is itself “open” or not. What does that even mean, to have an “open” point?

Madness, I tell you. But madness we will have to work with.

I add a few more checks to my clean and elegant span-overlap? predicate, to handle this failing test, and immediately—while I’m writing the code to fix this failure—recognize there’s another edge case that will fail as well. And then that case reminds me of another, and so on…. Let me write all of those here, too:

(fact "span-overlap? can notice touching-but-not-overlapping open spans"
  (span-overlap? (make-open-span 3 4) (make-span 4 6)) => false
  (span-overlap? (make-span 4 3) (make-open-span 2 3)) => false
  (span-overlap? (make-span 4 3) (make-span 2 3)) => true
  )


(fact "span-overlap? works as expected for overlapping open spans"
  (span-overlap? (make-open-span 3 8) (make-open-span 4 6)) => true
  (span-overlap? (make-open-span 3 8) (make-span 4 6)) => true
  (span-overlap? (make-span 3 8) (make-open-span 4 6)) => true
  (span-overlap? (make-open-span 3 4) (make-open-span 2 8)) => true
  (span-overlap? (make-open-span 3 4) (make-span 2 8)) => true
  (span-overlap? (make-span 3 4) (make-open-span 2 8)) => true
  )


(fact "span-overlap? works as expected for nested open spans"
  (span-overlap? (make-span 3 8) (make-open-span 4 6)) => true
  (span-overlap? (make-open-span 3 4) (make-span 2 8)) => true
  )


(fact "span-overlap? works for identical spans"
  (span-overlap? (make-span 3 4) (make-span 3 4)) => true
  (span-overlap? (make-span 3 4) (make-span 4 3)) => true
  (span-overlap? (make-open-span 3 4) (make-open-span 3 4)) => true
  )

I am sparing you some (much, actually) thrashing around here, just presenting you after the fact with where I ended up. Some of my tests were written wrong—always a sign that things are getting overwhelming and a place where you should start to get concerned about the usefulness of your abstractions. Several of my tests also didn’t catch other edge cases.

I start to think seriously about the new clojure.spec system for defining logical relationships, but I don’t actually want to use alpha software (as of this writing) or learn a whole new representation today. So I do that stupid thing (not foreshadowing this time) and try to pay better attention.

Along the way, I realize I also need to check the possibility that the two Span objects are not identical, but do coincide, in the sense that they are oriented in opposite directions with the same endpoints and openness. So I write a span-coincide? predicate as well.

I also discover that it’s more succinct and/or convenient to ask whether an endpoint is closed rather than open, so I end up with some helpers that check that for me as well. These are start-closed? and end-closed?, which simply contradict the built-in :start-open? and :end-open? fields of the Span record.

Here are the implementations of span-coincide? and span-overlap? that I ended up with:

(defn span-coincide?
  "Takes two spans, and returns `true` if they are
  identical, or if the reverse of the second is identical
  to the first"
  [span1 span2]
  (or (= span1 span2) (= span1 (span-reverse span2))))

(defn span-overlap?
  "takes two Span records, and returns true if they
  strictly overlap (taking into account openness of ends)"
  [span1 span2]
  (let [s1 (:start span1)
        e1 (:end span1)
        s2 (:start span2)
        e2 (:end span2)]
    (cond
      (and (span-include? span1 s2) (start-closed? span2)) true
      (and (span-include? span1 e2) (end-closed? span2)) true
      (and (span-include? span2 s1) (start-closed? span1)) true
      (and (span-include? span2 e1) (end-closed? span1)) true
      (and (span-include? span1 s2) (span-include? span1 e2)) true
      (and (span-include? span2 s1) (span-include? span2 e1)) true
      (span-coincide? span1 span2) true
      :else false
      )))

I’m not happy with it at the moment, but I can’t see a way to refactor it. I do think it’s interesting, though, how it explicitly “unfolds” the combinatorics of the edge cases in this problem into individual cases. Is the endpoint being tested open or closed? Are both end points of one span inside the other? Are the two Span items “the same”?

But I also dislike the visible repetition, and I want to refactor some or all of that away. I wonder how I can do that… and also how the “few minutes” to implement this type became an entire Saturday morning.

A bit of refactoring, and a whole new problem

I’d rather at least some of that repetition were removed to a separate function, just for clarity. So I pull out the thing that checks whether one end of a span is closed and falls inside the other, and also pull out the thing that checks to see if one span is surrounded by the other:

(defn contains-closed-end?
  "Takes two spans. Returns `true` if either end of the second one is closed AND falls within the first."
  [span1 span2]
  (let [s2 (:start span2)
        e2 (:end span2)]
    (or
      (and (span-include? span1 s2) (start-closed? span2))
      (and (span-include? span1 e2) (end-closed? span2)))))



(defn span-surrounds?
  "Takes two spans, and returns `true` if the first one completely surrounds the second one; that is if both ends fall strictly within the first span."
  [span1 span2]
  (let [s2 (:start span2)
        e2 (:end span2)]
    (and (span-include? span1 s2) (span-include? span1 e2))))



(defn span-overlap?
  "Takes two spans, and returns `true` if they strictly overlap (taking into account openness of ends)."
  [span1 span2]
  (let [s1 (:start span1)
        e1 (:end span1)
        s2 (:start span2)
        e2 (:end span2)]
    (cond
      (contains-closed-end? span1 span2) true
      (contains-closed-end? span2 span1) true
      (span-surrounds? span1 span2) true
      (span-surrounds? span2 span1) true
      (span-coincide? span1 span2) true
      :else false
      )))

I start feeling overwhelmed by all the edge cases, and have a sneaking suspicion I’ve missed something. I throw in a couple of tests that I’m sure will catch something I’ve slipped up on… and one of the most obvious-feeling tests fails right away.

(fact "span-overlap? works when an open end overlaps another"
  (span-overlap? (make-open-span 3 5) (make-open-span 4 6)) => true
  )

That is, an open-ended :span overlapping another one doesn’t trigger any of the conditions I’ve set up yet. Dagnabbit.

This starts to feel a bit like Whack-a-Mole. So I need another predicate, I suppose, which tests that an endpoint of one span falls “extra-strictly” within the other. “Extra-strictly” means, I guess, that the point should be detected by span-include? but also should not exactly equal either end point, even if both are closed.

Here’s the whole pile of stuff I’ve ended up with, just to get things to pass for all the edge cases I can imagine (so far) of points and :span overlaps, with open and closed ends on the :span items &c &c:

(defn span-include?
  "Takes a Span record and a scalar, and returns
  `true` if the scalar falls strictly within the Span"
  [span n]
  (let [s   (:start span)
        e   (:end span)
        so? (:start-open? span)
        eo? (:end-open? span)]
    (cond
      (and (= s n) so?) false
      (and (= e n) eo?) false
      (and (not (span-empty? span)) (= s e n)) true
      :else (not= (compare s n) (compare e n)))))



(defn contains-closed-end?
  "Takes two spans. Returns `true` if either end
  of the second one is closed AND falls within the first."
  [span1 span2]
  (let [s2 (:start span2)
        e2 (:end span2)]
    (or
      (and (span-include? span1 s2) (start-closed? span2))
      (and (span-include? span1 e2) (end-closed? span2)))))


(defn fully-contains-end?
  "Takes two spans. Returns `true` if either end
  of the second one falls within the first, and
  is not identical with either endpoint of the first."
  [span1 span2]
  (let [s1 (:start span1)
        e1 (:end span1)
        s2 (:start span2)
        e2 (:end span2)]
    (or
      (and (span-include? span1 s2) (not= s1 s2) (not= e1 s2))
      (and (span-include? span1 e2) (not= s1 e2) (not= e1 e2)))))




(defn span-surrounds?
  "Takes two spans, and returns `true` if the
  first one completely surrounds the second one;
  that is if both ends fall strictly within the first span."
  [span1 span2]
  (let [s2 (:start span2)
        e2 (:end span2)]
    (and (span-include? span1 s2) (span-include? span1 e2))))



(defn span-overlap?
  "Takes two spans, and returns `true` if they
  strictly overlap (taking into account openness of ends)."
  [span1 span2]
  (let [s1 (:start span1)
        e1 (:end span1)
        s2 (:start span2)
        e2 (:end span2)]
    (cond
      (contains-closed-end? span1 span2) true
      (contains-closed-end? span2 span1) true
      (fully-contains-end? span1 span2) true
      (fully-contains-end? span2 span1) true
      (span-surrounds? span1 span2) true
      (span-surrounds? span2 span1) true
      (span-coincide? span1 span2) true
      :else false
      )))

There’s a kind of code smell here I don’t like, but I also don’t know what to do about it. I’m writing finer and finer detectors, some of which (like this one) overlap the others in an almost-nested but not-quite-nested way. It feels a lot like the “open” vs “closed” abstraction is making a mess somehow. Makes me grumpy and frustrated, to be frank… but even more important, it makes me worry I’m missing some other kind of edge case that will also blow up, and that this new edge case will be so esoteric that I don’t even notice the failure.

Moving on…

That feeling of frustration means I should probably step away and maybe sleep on the whole thing. For the moment, all the tests pass, and I can’t think of any more that exercise this particular functionality. For something that I imaginatively thought would take “a few minutes”, it’s been a long day already.

Hell, I haven’t even gotten to the interesting stuff, which is the real point of these little objects: binary operators over them. Let me spend some Me Time now just listing them here, speculatively, so I have something nice and optimistic to look at in the morning.

Thinking back to the half-remembered implementation I did before, I recall things like this:

:span-extent takes two :span items, and returns a new one that’s the maximum length from the leftmost of all four endpoints, to the rightmost. The orientation of the result is always left-to-right.
:span-net takes two :span items, and returns a new one that connects the :start of the first to the :end of the second (ignoring all excursions and jumps in between)
:span-gap is like :span-net but it connects the :end of the first to the :start of the second.
:span-space connects the first :start to the second :start
:span-error connects the first :end to the second :end
:span-reverse swaps the :start and :end, and openness of one :span item
:span-close makes both ends of a :span closed
:span-open makes both ends of a :span open
:span-subtract takes two :span items; any portion of the first one overlapping the second one is deleted; if the result is an empty :span, both :start and :end become the old :end
:span-xor takes two :span items; imagine drawing the first, :start to :end, and then setting a “pen” back down at the :start of the second which “draws” on unoccupied empty space, but “erases” occupied space. Thus any region of mutual overlap between the two is removed from both. Note that two :spans can result, an that the orientation of both is preserved.
:span-intersection is something like :span-xor, but instead of drawing lines with a self-erasing pencil, draw them with a pencil that only makes a “real mark” wherever at least two lines cross; thus, the new :span is composed of only the part of the arguments that overlaps, and has an orientation the same as the second one; again, if the result is empty, its :start and :end are the same as the :end of the second argument
etc…

Yeah, that should keep me busy for a few more minutes. I’ll do a bit more tomorrow, I think.

Continued in “Adding a Span type to Push”…

It’s an old Buddhist joke about sandwiches or pizza or something. ↩
This “for numeric arguments” is a huge caveat I admonish you to explore on your own. It’s fun! Try some string comparisons, for example (compare "foo" "bar")…. ↩