Recently, while I was reading “Writing
Efficient Programs”, I came across a beautiful variation of binary
search and its implementation. In this post I’ll be using D’s
metaprogramming capabilities to implement a generic version of the
algorithm.
For the rest of the post I’m going to assume you’re familiar with
binary search, and its most common implementation–the bisection method
using two pointers.
A beautiful algorithm
There are many ways to implement a binary search. The most common is
probably using a “bisection search”, where we track the subintervals
using two pointers, one moving from left to right, the other from right
to left.
Another variant is the “uniform search”, where instead of using two
pointers, we use only a single pointer and a “rate of change”, i.e., the
start and the size of the subintervals. This is more subtle than
bisection search–which is not trivial by the way. In Knuth’s words:
It is possible to do this, but only if extreme care is paid to the
details, […]. Simpler approaches are doomed to failure.
On the other hand, uniform search has some advantages. One of them is
that the the rates of change can be precalculated, and stored in a side
table. If we get the rate of change calculation right–which is the
subtle part–the search algorithm is simpler than its cousin using two
pointers.
A variation of uniform search is Shar’s algorithm. It does away with
the side table, and uses power of two interval sizes.
It starts by comparing the the key we are looking for, K, with Ki, where i = 2k, k = ⌊lgN⌋.
If K < Ki,
the interval sizes are 2k − 1, 2k − 2, …, 1, 0. But if K > Ki,
we set i = N − 2l + 1,
where l = ⌈lg(N−2k+1)⌉,
adjust the interval pointer, and the interval sizes are 2l − 1, 2l − 2, …, 1, 0.
Shar’s algorithm determines the position of the entry with key K bit by bit. Each test adds one
more bit. We need one more test, after we’ve determined all the bits, to
see if the entry is actually in the table.
A beautiful implementation
Bentley provides a beautiful version of Shar’s algorithm in his book.
The code works for tables with 1000 elements. The code in the book is
written in Pascal, but transliterated to D it looks like this:
int bsearch1000(int[1001] xs, int x)
{
auto i = 0;
if (xs[512] <= x) { i = 1000 - 512 + 1; }
if (xs[i + 256] <= x) { i += 256; }
if (xs[i + 128] <= x) { i += 128; }
if (xs[i + 64] <= x) { i += 64; }
if (xs[i + 32] <= x) { i += 32; }
if (xs[i + 16] <= x) { i += 16; }
if (xs[i + 8] <= x) { i += 8; }
if (xs[i + 4] <= x) { i += 4; }
if (xs[i + 2] <= x) { i += 2; }
if (xs[i + 1] <= x) { i += 1; }
if (xs[i] == x) return i;
else return 0;
}
There’s a few things going on here. First, the odd array size, 1001.
Pascal arrays are one-based. D follows in C’s footsteps with zero-based
arrays. We just ignore xs[0]
in this case. This is a bug by
the way. Bentley acknowledges it, but he doesn’t provide a fix since he
considers it would detract from the exposition. We can fix it by setting
i
to 1
initially, and making the necessary
code adjustments. This would complicate the code somewhat. Another way
to fix it by explicitly checking if i
is 0
in
the last test.
Second, the code fully unrolls the search loop. This is only possible
because we know the size of the table beforehand. The code can be
adjusted for other table sizes as needed.
What makes this code beautiful is that it’s about as efficient as it
could be. It’s also uniform, and relatively easy to understand if you
know about Shar’s algorithm. It’s an almost word for word rendition of
the algorithm tailored for this particular table size.
Beautiful as it is, Bentley’s code is somewhat tedious to adjust to
other table sizes, and it’s easy to make mistakes while calculating the
initial value of i
. The code is very repetitive and
uniform. This is a strong hint that we can automate writing it.
This is where D’s powerful metaprogramming capabilities come in. If
we can determine the size of the table at compile time, we could in
principle generate the code for the unrolled loop automatically.
Let’s focus on the first of these problems: determining the size of
the table. If the table is implemented with a static array we can
determine its size at compile time, since the size is part of the
type.
Although I expected D’s standard library to have a utility for
exactly this case, I couldn’t find one. But, as it turns out, it’s quite
easy to write:
template StaticArraySize(T)
if (isStaticArray!T)
{
static if (is(T == E[n], E, int n))
enum StaticArraySize = n;
}
We can use the is
expression to check if T
is a static array and extract its size. That is neat. The extra check
isStaticArray!T
is not strictly necessary, but it improves
the error message if StaticArraySize
is called with a type
that’s not a static array.
Like C++, the static if
is evaluated at compile time.
Unlike C++, it doesn’t introduce a new scope. The enum
is
going to be generated in the scope of the template
. And
since the enum
has the same name as the
template
, the whole value of the template
is
going to be n
, a compile time value.
With the StaticArraySize
out of the way, we can focus on
code generation. The algorithm has three parts:
-
The initial test, comparing the search key K with Ki for i = 2k, k = ⌊lgN⌋;
-
Determining successive bits of the candidate position by
iterating over powers of two; and -
Checking if the we found the element we are looking for.
I’m going to show the function signature first, since it’s a bit
complicated, and I’m going to explain every bit of it after.
auto bsearch(T, U = ArrayElementType!T, int n = StaticArraySize!T, int k = iflog2(n - 1))(T xs, U x)
{
While this looks complicated, it’s cleaner than what we’d get in C++.
Let’s take each template parameter in turn and see what it means.
T
is the table type. U
is the element type.
ArrayElementType
is a template very similar to
StaticArraySize
that evaluates to the type of the elements
of an array–static or dynamic, it doesn’t matter. n
is the
size of the array. Like C++, D can have both type and value template
parameters. We use StaticArraySize
as the default value of
n
–remember, StaticArraySize
is a compile time
value. This allows us to pass the size of the array explicitly, but I’m
just using it as a compile time assignment in this case. Not that the
call to StaticArraySize
also acts as a type guard on
T
. Instantiating the template with a T
that is
not a static array will fail. Then we determine k, the power of two where we start
the search. iflog2
stands for integer floor
logarithm in base 2. This is a regular D function, but
it can be evaluated at compile time when called with a compile time
value, which is what we do here.
Phew, that was a lot. Compare to the template parameters, the regular
parameters are almost boring: the table xs
and the key
x
we are looking for.1
The initial test is just the code rendition of the test in Shar’s
algorithm:
auto p = 0;
if (xs[pow(2, k)] <= x) p = (n - 1) - pow(2, k) + 1;
We track the position of the candidate element in p
.
Now for the fun bit, generating the power of two tests:
static foreach_reverse (int i; 0 .. k)
{
if (xs[p + pow(2, i)] <= x) p += pow(2, i);
}
This code is remarkably short thanks to the problem’s regularity we
mentioned earlier, and to D’s powerful metaprogramming capabilities.
Like the static if
, a static foreach
doesn’t
introduce a new scope. The code inside is just “spliced” in the code of
the surrounding function. In effect, this snippet generates a series of
if
statements equivalent to the one in
bsearch1000
. We use foreach_reverse
to iterate
over the the exponents k to
1–the range 0 .. k
is
open, and we’re iterating over it in reverse. The choice of
foreach_reverse
as a keyword is somewhat unfortunate. There
may be a cleaner way of achieving the same result, but I don’t use D
regularly, so this is what I came up with :^).
Once we’ve determined all the bits of the candidate element position
p
all that’s left to do is to check if the element we’re
looking for is at that position.
if (p > 0 && xs[p] == x) return p;
return 0;
}
And with this we’re done. If we check the code generated for
bsearch1000
and bsearchDI
on the
compiler explorer. we see that’s it’s virtually the same up. The D
compiler even inlined the compile time constants for us.
Conclusion
Shar’s algorithm is a beautiful variation of binary search. If we
know the table size in advance we can write an efficient binary search
algorithm using it.
Bentley’s implementation is also beautiful because it squeezes every
bit of performance from the code. The code does no more than it should,
and there is a Zen kind of beauty to it.
D’s metaprogramming capabilities allowed us to take Bentley’s
beautiful code, and make it more general. We can generate the code for
any table size if we know it at compile time.
While I didn’t show it here, we can use similar techniques to
determine if we’re dealing with a static or dynamic array at compile
time. If we have a static array we can use our version of binary search.
Otherwise we can fall back to a more general version. Andrei
Alexandrescu calls this technique “Design by Introspection”.
D is a fun language to program in. It shines for this kind of
problems, where you can make heavy use of metaprogramming. I haven’t
used any other language where this problem could be expressed as
cleanly, except maybe Common Lisp. I’d be curious to see a solution in
Rust (which has a strong macro system) and Zig (which I read has strong
support for metaprogramming as well).