When do continuous spans overlap?
Draft of 2016.07.02
May include: Push ↘ GP ↗ learning in public ↖ &c.
This morning I thought I’d spend a “few minutes” to address an old issue in the Klapaucius Push interpreter. Long ago, in some other genetic programming interpreter—I no longer recall which one—I remember implementing a type which described a continuous range of scalar values. I think it was called a span
, but it might also have been called a range
, but now I’m working in Clojure and a range
is a core function describing something rather different. I want mine to be a bit more abstract and even lazier than the Clojure function: more of a static structure, honestly.
In the last year it’s become surprisingly easy to add a new type to the Klapaucius interpreter, though as I look over it I see some places to improve and simplify even more. But that’s for another day. The basic workflow is (today):
- assuming the new type requires a Clojure
record
to manage it, create a new file likepush.type.definitions.span
, where thedefrecord
and any associated type-specific “helper” functions will be placed, plus arecognizer
predicate function that will tell me whether an arbitrary Clojure thingie is a Push:span
or not - create a Midje test file in
push.type.definitions.span_test
, and start implementing tests for the core functionality of the type - create a file like
push.type.item.span
, which actually constructs the Push type itself, defines its instructions with the Klapauciusbuild-instruction
macro and PushDSL code, addsaspect
-based automatically-generated instructions if you want any, and finally invokesmake-type
(ormake-module
) to compile it all into a single record - build a Midje test file like
push.instructions.base.span_test
to exercise the new type’s instructions and router, and make sure an arbitrary Push interpreter does what’s expected whenever any instruction is called - actually wire the fully-tested type (or module) into the interpreter templates, especially the one I’ve been calling
one-with-everything
1
Actually, now I look at this it doesn’t seem quite as “simple” as I imagined. It’s simpler than it has ever been, but I think very soon I will need to fix it quite a bit more. It should be three steps, easy to understand and explain and implement.
But not today.
Spans
The :span
type I have in mind represents a continuous set of real numbers, a “region of the number line”. As you probably remember from High School math, we speak of these regions as being “open” or “closed” on each end, too, depending on whether the number at the end is strictly part of the set or is just outside the set. For example, I’m envisioning the :span
\([3,4]\) includes the numbers 3 and 4 and every other number in between; \((3,4]\) does not include the number 3, just everything between it and 4 (and also 4); \((3,4)\) includes all the numbers between 3 and 4, but neither of those.
Unlike the mathematical convention—and mainly because I know random programs and evolutionary search are going to do this even if I don’t want them to—I’m willing to permit orientation here as well. That is, \([3,4]\) is not equal to \([4,3]\) in my scheme, but they do coincide with one another, in the sense they contain the exact same set of numbers. This doesn’t imply anything about the mathematics; it’s honestly just an arbitrary design decision I’ve made to see what happens (if anything).
I figure I’ll spend some time building some “simple infrastructure” (famous last words) in push.type.definitions.span
, because I know/remember a few things I want this type to do. Most of the simplest stuff comes “for free” when I add a Push aspect like equatable
or moveable
later on, so I can focus on the type-specific behavior here. What I have in mind are mainly relations between :span
items and :scalar
values:
- does the
:span
contain the:scalar
? - does the
:span
overlap another:span
? - does the
:span
contain another:span
?
Those seem pretty straightforward, but (foreshadowing!) I am starting to guess that with orientations and open vs closed ends, there might be some edge cases to handle.
Literal edge cases
Getting started is pretty easy, although I think for a bit about some possible architectures for the basic Clojure Span
record, to handle open vs closed ends. Obviously every Span
will have a :start
and :end
, because duh. But either or both of those could be “open” or “closed”, and one wonders how to handle that. One “simple” way is to add two new fields to the record that specify each end’s openness, something like :start-open?
and :end-open?
. Another way might involve making a variety of different Span
“subclasses”, like Span
and HalfSpan
and OpenSpan
, but that idea makes me back away. Another other way might involve leaving Span
with a single :start
and :end
, and then pushing “open” vs “closed” somehow down into the values of :start
and :end
. This last one is interesting, but odd; I wonder if there is such a thing as an “open number” vs “closed number”, because at least up to the point where this occurred to me, I only thought numbers were numbers.
While the last one feels philosophically interesting, I’m actually not here to do much philosophy. The first one feels like the simplest way to proceed, so I add
(defrecord Span [start end start-open? end-open?])
This is just raw Clojure, and to be honest I never like using the “free” constructor ->Span
outright, so I add a simple wrapper as well, like this:
(defn make-span "Takes `:start` and `:end` scalar numeric arguments and by default returns a closed `Span` record. Optionally takes `:start-open?` and `:end-open?` boolean keyword arguments, which can create a partially or fully open span." [start end & {:keys [start-open? end-open?] :or {start-open? false end-open? false}}] (->Span start end start-open? end-open?) )
I test that a bit while I’m writing it and for the most part beforehand, but I’m recording this narrative after the fact so let me show you how the tests end up. Just to give a sense of how Midje testing works, in case you’re landing here out of order:
(ns push.type.definitions.span_test (:use midje.sweet) (:use [push.type.definitions.span])) (fact "make-span returns a Span record with closed bounds" (make-span 9 6) => (->Span 9 6 false false) (make-span 1 1) => (->Span 1 1 false false)) (fact "make-span can take optional keyworded open bounds arguments" (make-span 9 6 :start-open? true) => (->Span 9 6 true false) (make-span 1 1 :end-open? true) => (->Span 1 1 false true) (make-span 2 3 :start-open? true :end-open? true) => (->Span 2 3 true true)) (fact "Span getters work as expected" (:start (make-span 3 4)) => 3 (:end (make-span 3 4)) => 4 (:start-open? (make-span 3 4)) => false )
Like I said, I’ve written most of these tests before I wrote code to make them pass, using the errors and failures produced by Marick’s excellent Midje framework to direct me what is needed after each new test is added.
The question of whether a number falls “within” a :span
seems like a good starting point after this infrastructure is done. Here are some tests I end up with:
(fact "span-include? works for closed spans" (span-include? (make-span 3 4) 3.5) => true (span-include? (make-span 3 4) 3) => true (span-include? (make-span 3 4) 4) => true (span-include? (make-span 3 4) 2) => false (span-include? (make-span 3 4) 7) => false (span-include? (make-span 4 3) 3.5) => true (span-include? (make-span 4 3) 3) => true (span-include? (make-span 4 3) 4) => true (span-include? (make-span 4 3) 2) => false (span-include? (make-span 4 3) 7) => false )
And the code I need to make those pass:
(defn span-include? "Takes a Span record and a scalar, and returns true if the scalar falls strictly within the Span" [span n] (let [s (:start span) e (:end span)] (not= (compare s n) (compare e n))))
I’m pretty pleased with that compare
sitting there. The Clojure compare
function is one of my favorite algorithmic gimmicks, it turns out. For numeric arguments2 it returns -1
if the first argument is strictly less than the second, 1
if it’s strictly larger, and 0
if they’re equal. Just sketching a bit, I realize that if the number is outside the span on the left or right, it will be (respectively) less than or greater than both ends of the :span
, regardless of the orientation of that :span
. If it “lands on” either end, it’ll be equal and not-equal….
Oh hang on.
Right, so a :span
can have identical :start
and :end
. Oh, and also those can be open. So if the number falls on an endpoint, whether it’s “inside” the :span
will depend whether that point is open or not, and also whether the span is zero-length.
Time for some more tests! I realize as I’m writing the first test how annoying and wordy it is to write (make-span x y :start-open? true :end-open? true)
, so I throw in a new function I haven’t even written yet, called make-open-span
, which I assume will just by default set both ends to “open” status.
(fact "span-include? works for open spans" (span-include? (make-open-span 3 4) 3.5) => true (span-include? (make-open-span 3 4) 3) => false (span-include? (make-open-span 3 4) 4) => false (span-include? (make-open-span 3 4) 2) => false (span-include? (make-open-span 3 4) 7) => false (span-include? (make-open-span 4 3) 3.5) => true (span-include? (make-open-span 4 3) 3) => false (span-include? (make-open-span 4 3) 4) => false (span-include? (make-open-span 4 3) 2) => false (span-include? (make-open-span 4 3) 7) => false ) (fact "span-include? works for empty spans" (span-include? (make-open-span 3 3) 3) => false (span-include? (make-open-span 3 3) 4) => false (span-include? (make-open-span 3 3) 2) => false ) (fact "span-include? works for partially open spans" (span-include? (make-span 3 4 :end-open? true) 3) => true (span-include? (make-span 3 4 :end-open? true) 4) => false ) (fact "span-include? works for point spans" (span-include? (make-span 3 3) 3) => true (span-include? (make-span 3 3) 4) => false )
I also realize as I’m writing some of these that I’ve left room for there to be such a thing as an “empty span”: one with identical :start
and :end
values, and either endpoint open. Well, sortof: one has to wonder what a :span
“really is” where both ends are identical values, but one is open and one is closed, like \([3,3)\). I elide this issue for the moment, just by saying “we’ll burn that bridge when we get to it,” and thinking something like “openness trumps closedness”.
Anyway, there’s some wrestling after this, and a few more definitions of helper functions and predicates, and I end up with the following function, which seems to handle all these edge cases:
(defn make-open-span "Takes `:start` and `:end` scalar numeric arguments and returns a doubly-open `Span` record." [start end] (->Span start end true true)) (defn span-empty? "Takes a Span and returns `true` if it has identical `:start` and `:end`, and either is open" [span] (and (= (:start span) (:end span)) (or (:start-open? span) (:end-open? span)))) (defn span-include? "Takes a Span record and a scalar, and returns true if the scalar falls strictly within the Span" [span n] (let [s (:start span) e (:end span) so? (:start-open? span) eo? (:end-open? span)] (cond (and (= s n) so?) false (and (= e n) eo?) false (and (not (span-empty? span)) (= s e n)) true :else (not= (compare s n) (compare e n)))))
This conditional result in the span-include?
gets a bit complicated. The first two lines essentially say: “Is the point exactly on the end of the :span
, and is that end also open?” The last one is the same old one that worked fine for the case of a non-degenerate closed :span
. The third clause was pretty tricky, and to be honest I’m not entirely comfortable with it even yet. It exists to handle the possibility that the span is degenerate (same :start
and :end
) but not empty, when the point lands exactly on one (and thus both) of the endpoints.
Literally edge cases!
That said, I can’t think of any more situations, for whatever combination of degeneracy, openness, emptiness, and so forth.
Overlapping spans
The actual point of the :span
Push type is to handle some interesting Computational Geometry benchmarks I’ve been throwing at GP systems for the last few years. These involve complex hulls and the “shadows” of polygons projected onto points in 2d spaces… plus a whole bunch of other stuff…. But the point here is that my notional “span” can be thought of as the shadow of some geometric object in a plane, and the goal of such a notion is whether and how much that “shadow” overlaps some other object. I’ve already handled the case where the “shadow” does or doesn’t include a 1d point. Now it’s time to handle the case(s) in which one :span
overlaps another.
The easiest one for me to get my head around is the case where both spans are fully closed, and non-degenerate, and either overlapping on one end or nested inside one another.
(fact "span-overlap? works as expected for closed spans" (span-overlap? (make-span 3 5) (make-span 4 6)) => true (span-overlap? (make-span 3 5) (make-span 2 4)) => true (span-overlap? (make-span 3 5) (make-span 17 77)) => false ) (fact "span-overlap? works as expected for nested spans" (span-overlap? (make-span 3 8) (make-span 4 6)) => true (span-overlap? (make-span 3 4) (make-span 2 8)) => true )
Makes sense, right?
I write a very simple span-overlap?
function here that invokes span-include?
to see whether the endpoints of one :span
fall within the other. This works, but it fails immediately for the very next test I write so I won’t even bother showing it to you. I suspect you could already write it yourself, if you wanted.
But here’s the first bad thing:
(fact "span-overlap? can notice touching-but-not-overlapping open spans" (span-overlap? (make-open-span 3 4) (make-span 4 6)) => false (span-overlap? (make-span 4 3) (make-open-span 2 3)) => false (span-overlap? (make-span 4 3) (make-span 2 3)) => true )
Aha, nope, that fails badly with just “is the end point of one inside the other” testing. The reason makes me hesitate and consider my fundamental design choice for the Span
record and the span-include?
functions: the failures occur when the open endpoint of one span falls inside the other, because when I check with span-include?
to determine whether a point is inside a span (including overlapping its endpoint), there is no concept of whether the point is itself “open” or not. What does that even mean, to have an “open” point?
Madness, I tell you. But madness we will have to work with.
I add a few more checks to my clean and elegant span-overlap?
predicate, to handle this failing test, and immediately—while I’m writing the code to fix this failure—recognize there’s another edge case that will fail as well. And then that case reminds me of another, and so on…. Let me write all of those here, too:
(fact "span-overlap? can notice touching-but-not-overlapping open spans" (span-overlap? (make-open-span 3 4) (make-span 4 6)) => false (span-overlap? (make-span 4 3) (make-open-span 2 3)) => false (span-overlap? (make-span 4 3) (make-span 2 3)) => true ) (fact "span-overlap? works as expected for overlapping open spans" (span-overlap? (make-open-span 3 8) (make-open-span 4 6)) => true (span-overlap? (make-open-span 3 8) (make-span 4 6)) => true (span-overlap? (make-span 3 8) (make-open-span 4 6)) => true (span-overlap? (make-open-span 3 4) (make-open-span 2 8)) => true (span-overlap? (make-open-span 3 4) (make-span 2 8)) => true (span-overlap? (make-span 3 4) (make-open-span 2 8)) => true ) (fact "span-overlap? works as expected for nested open spans" (span-overlap? (make-span 3 8) (make-open-span 4 6)) => true (span-overlap? (make-open-span 3 4) (make-span 2 8)) => true ) (fact "span-overlap? works for identical spans" (span-overlap? (make-span 3 4) (make-span 3 4)) => true (span-overlap? (make-span 3 4) (make-span 4 3)) => true (span-overlap? (make-open-span 3 4) (make-open-span 3 4)) => true )
I am sparing you some (much, actually) thrashing around here, just presenting you after the fact with where I ended up. Some of my tests were written wrong—always a sign that things are getting overwhelming and a place where you should start to get concerned about the usefulness of your abstractions. Several of my tests also didn’t catch other edge cases.
I start to think seriously about the new clojure.spec
system for defining logical relationships, but I don’t actually want to use alpha software (as of this writing) or learn a whole new representation today. So I do that stupid thing (not foreshadowing this time) and try to pay better attention.
Along the way, I realize I also need to check the possibility that the two Span
objects are not identical, but do coincide, in the sense that they are oriented in opposite directions with the same endpoints and openness. So I write a span-coincide?
predicate as well.
I also discover that it’s more succinct and/or convenient to ask whether an endpoint is closed rather than open, so I end up with some helpers that check that for me as well. These are start-closed?
and end-closed?
, which simply contradict the built-in :start-open?
and :end-open?
fields of the Span
record.
Here are the implementations of span-coincide?
and span-overlap?
that I ended up with:
(defn span-coincide? "Takes two spans, and returns `true` if they are identical, or if the reverse of the second is identical to the first" [span1 span2] (or (= span1 span2) (= span1 (span-reverse span2)))) (defn span-overlap? "takes two Span records, and returns true if they strictly overlap (taking into account openness of ends)" [span1 span2] (let [s1 (:start span1) e1 (:end span1) s2 (:start span2) e2 (:end span2)] (cond (and (span-include? span1 s2) (start-closed? span2)) true (and (span-include? span1 e2) (end-closed? span2)) true (and (span-include? span2 s1) (start-closed? span1)) true (and (span-include? span2 e1) (end-closed? span1)) true (and (span-include? span1 s2) (span-include? span1 e2)) true (and (span-include? span2 s1) (span-include? span2 e1)) true (span-coincide? span1 span2) true :else false )))
I’m not happy with it at the moment, but I can’t see a way to refactor it. I do think it’s interesting, though, how it explicitly “unfolds” the combinatorics of the edge cases in this problem into individual cases. Is the endpoint being tested open or closed? Are both end points of one span inside the other? Are the two Span
items “the same”?
But I also dislike the visible repetition, and I want to refactor some or all of that away. I wonder how I can do that… and also how the “few minutes” to implement this type became an entire Saturday morning.
A bit of refactoring, and a whole new problem
I’d rather at least some of that repetition were removed to a separate function, just for clarity. So I pull out the thing that checks whether one end of a span is closed and falls inside the other, and also pull out the thing that checks to see if one span is surrounded by the other:
(defn contains-closed-end? "Takes two spans. Returns `true` if either end of the second one is closed AND falls within the first." [span1 span2] (let [s2 (:start span2) e2 (:end span2)] (or (and (span-include? span1 s2) (start-closed? span2)) (and (span-include? span1 e2) (end-closed? span2))))) (defn span-surrounds? "Takes two spans, and returns `true` if the first one completely surrounds the second one; that is if both ends fall strictly within the first span." [span1 span2] (let [s2 (:start span2) e2 (:end span2)] (and (span-include? span1 s2) (span-include? span1 e2)))) (defn span-overlap? "Takes two spans, and returns `true` if they strictly overlap (taking into account openness of ends)." [span1 span2] (let [s1 (:start span1) e1 (:end span1) s2 (:start span2) e2 (:end span2)] (cond (contains-closed-end? span1 span2) true (contains-closed-end? span2 span1) true (span-surrounds? span1 span2) true (span-surrounds? span2 span1) true (span-coincide? span1 span2) true :else false )))
I start feeling overwhelmed by all the edge cases, and have a sneaking suspicion I’ve missed something. I throw in a couple of tests that I’m sure will catch something I’ve slipped up on… and one of the most obvious-feeling tests fails right away.
(fact "span-overlap? works when an open end overlaps another" (span-overlap? (make-open-span 3 5) (make-open-span 4 6)) => true )
That is, an open-ended :span
overlapping another one doesn’t trigger any of the conditions I’ve set up yet. Dagnabbit.
This starts to feel a bit like Whack-a-Mole. So I need another predicate, I suppose, which tests that an endpoint of one span falls “extra-strictly” within the other. “Extra-strictly” means, I guess, that the point should be detected by span-include?
but also should not exactly equal either end point, even if both are closed.
Here’s the whole pile of stuff I’ve ended up with, just to get things to pass for all the edge cases I can imagine (so far) of points and :span
overlaps, with open and closed ends on the :span
items &c &c:
(defn span-include? "Takes a Span record and a scalar, and returns `true` if the scalar falls strictly within the Span" [span n] (let [s (:start span) e (:end span) so? (:start-open? span) eo? (:end-open? span)] (cond (and (= s n) so?) false (and (= e n) eo?) false (and (not (span-empty? span)) (= s e n)) true :else (not= (compare s n) (compare e n))))) (defn contains-closed-end? "Takes two spans. Returns `true` if either end of the second one is closed AND falls within the first." [span1 span2] (let [s2 (:start span2) e2 (:end span2)] (or (and (span-include? span1 s2) (start-closed? span2)) (and (span-include? span1 e2) (end-closed? span2))))) (defn fully-contains-end? "Takes two spans. Returns `true` if either end of the second one falls within the first, and is not identical with either endpoint of the first." [span1 span2] (let [s1 (:start span1) e1 (:end span1) s2 (:start span2) e2 (:end span2)] (or (and (span-include? span1 s2) (not= s1 s2) (not= e1 s2)) (and (span-include? span1 e2) (not= s1 e2) (not= e1 e2))))) (defn span-surrounds? "Takes two spans, and returns `true` if the first one completely surrounds the second one; that is if both ends fall strictly within the first span." [span1 span2] (let [s2 (:start span2) e2 (:end span2)] (and (span-include? span1 s2) (span-include? span1 e2)))) (defn span-overlap? "Takes two spans, and returns `true` if they strictly overlap (taking into account openness of ends)." [span1 span2] (let [s1 (:start span1) e1 (:end span1) s2 (:start span2) e2 (:end span2)] (cond (contains-closed-end? span1 span2) true (contains-closed-end? span2 span1) true (fully-contains-end? span1 span2) true (fully-contains-end? span2 span1) true (span-surrounds? span1 span2) true (span-surrounds? span2 span1) true (span-coincide? span1 span2) true :else false )))
There’s a kind of code smell here I don’t like, but I also don’t know what to do about it. I’m writing finer and finer detectors, some of which (like this one) overlap the others in an almost-nested but not-quite-nested way. It feels a lot like the “open” vs “closed” abstraction is making a mess somehow. Makes me grumpy and frustrated, to be frank… but even more important, it makes me worry I’m missing some other kind of edge case that will also blow up, and that this new edge case will be so esoteric that I don’t even notice the failure.
Moving on…
That feeling of frustration means I should probably step away and maybe sleep on the whole thing. For the moment, all the tests pass, and I can’t think of any more that exercise this particular functionality. For something that I imaginatively thought would take “a few minutes”, it’s been a long day already.
Hell, I haven’t even gotten to the interesting stuff, which is the real point of these little objects: binary operators over them. Let me spend some Me Time now just listing them here, speculatively, so I have something nice and optimistic to look at in the morning.
Thinking back to the half-remembered implementation I did before, I recall things like this:
:span-extent
takes two:span
items, and returns a new one that’s the maximum length from the leftmost of all four endpoints, to the rightmost. The orientation of the result is always left-to-right.:span-net
takes two:span
items, and returns a new one that connects the:start
of the first to the:end
of the second (ignoring all excursions and jumps in between):span-gap
is like:span-net
but it connects the:end
of the first to the:start
of the second.:span-space
connects the first:start
to the second:start
:span-error
connects the first:end
to the second:end
:span-reverse
swaps the:start
and:end
, and openness of one:span
item:span-close
makes both ends of a:span
closed:span-open
makes both ends of a:span
open:span-subtract
takes two:span
items; any portion of the first one overlapping the second one is deleted; if the result is an empty:span
, both:start
and:end
become the old:end
:span-xor
takes two:span
items; imagine drawing the first,:start
to:end
, and then setting a “pen” back down at the:start
of the second which “draws” on unoccupied empty space, but “erases” occupied space. Thus any region of mutual overlap between the two is removed from both. Note that two:span
s can result, an that the orientation of both is preserved.:span-intersection
is something like:span-xor
, but instead of drawing lines with a self-erasing pencil, draw them with a pencil that only makes a “real mark” wherever at least two lines cross; thus, the new:span
is composed of only the part of the arguments that overlaps, and has an orientation the same as the second one; again, if the result is empty, its:start
and:end
are the same as the:end
of the second argument- etc…
Yeah, that should keep me busy for a few more minutes. I’ll do a bit more tomorrow, I think.
Continued in “Adding a Span type to Push”…