The **fast inverse square root algorithm**, made famous (though not invented) by programming legend John Carmack in the Quake 3 source code, computes an inverse square root *wild*.

*understanding*how it works.

I’m not the first to write about this algorithm, and I surely won’t be the last, but my aim is to show *every* step of the process. A lot of really fantastic resources out there still skip over steps in derivations, or fail to highlight out apparently obvious points. My goal is to remove any and all magic from this crazy algorithm.

It’s important to note that this algorithm is very much *of its time*. Back when Quake 3 was released in 1999, computing an inverse square root was a slow, expensive process. The game had to compute hundreds or thousands of them per second in order to solve lighting equations, and other 3D vector calculations that rely on normalization. These days, on modern hardware, not only would a calculation like this not take place on the CPU, even if it did, it would be fast due to much more advanced dedicated floating point hardware.

*“evil floating point bit level hacking, what the fuck”*is a fantastic explanation, but I do want to dig in quite a bit further.

One of the key ideas behind this algorithm, and the reason this works, is because the raw bit pattern of a float, when interpreted as 32-bit signed integer, essentially approximates a scaled and shifted `log2(x)`

function.

Logarithms have a bunch of rules, properties, and identities that can be exploited to make computing the inverse square root easy for a computer, using only simple operations like adds and shifts (though there are some supporting floating point multiplications, which we’ll talk about later).

In order to make sense of what it even means to interpret the bit pattern of a float, we need to look at how floats are represented in memory, and how the “value” of a float is derived from that representation.

An IEEE-754 32-bit float can be regarded as a struct, which holds 3 members. Using C’s bit-field notation here:

struct float_raw { int32_t mantissa : 23; int32_t exponent : 8; int32_t sign : 1; }

**Sign:** 1 bit which indicates whether or not the number is positive or negative

**Exponent**: 8 bits which are used to dictate the range that this number will fall into

**Mantissa**: 23 bits which linearly specifies where exactly in the range this number lives

The following equation shows how the actual numerical value

N = -1^S times 2^{E-127} times (1 + frac{M}{2^{23}})

$$

Or, if we break some of the variables out:

m = frac{M}{2^{23}}

$$

B = 127

$$

e = E-B

$$

N = -1^S times 2^e times (1 + m)

$$

Note the little trick to get the sign bit from a 0 or a 1 into a -1 or a 1. Also notice that instead of simply multiplying by m, we multiply by 1+m. This ensures that when

Let’s take an example like the (approximate) number `-1.724`

. It’s underlying representation would look like this:

One interesting thing is that the exponent is actually stored in a biased format. The actual exponent value is

The next complexity is that an exponent

N = -1^S times 2^{-126} times m

$$

The exponent is set to -126. The mantissa value doesn’t have an added value of one (in fact it’s implicitly

When the exponent `NaN`

and `Infinity/-Infinity`

. If

`NaN`

(not a number), which is used to signify when an illegal operation has taken place, like

Of course, typically this internal representation is completely irrelevant to the programmer; They can just perform calculations and get results. Indeed, William Kahan notes in his 1998 presentation “How Java’s Floating-Point Hurts Everyone Everywhere”:

Error-analysis tells us how to design floating-point arithmetic, like IEEE Standard 754, moderately tolerant of well-meaning ignorance among programmers

The idea being that “numerical sophistication” is not necessary to make effective use of floating point.

But that said, an intimate familiarity of the format can lead to some clever designs. We looked at how the integer parts of a float translate to a decimal number, but we can also talk about those same parts in terms of an integer representation in mathematical form:

I_x = 2^{31}S + 2^{23}E + M

$$

That

L = 2^{23}

$$

I_x = 2^{31}S + LE + M

$$

And since we’re mainly going to be talking about taking square roots of numbers, we can assume the sign is positive (

I_x = LE + M

$$

If you take a closer look at the raw bits of a float, you can make some interesting observations.

The first is that every *range* – that is the set of numbers which can be represented by any given exponent value – has approximately 50% less precision than the range before it.

For example, take the exponent value `(1 << 23) - 1`

) distinct steps from 1 to just below 2.

Contrast that with exponent value

This relationship is *logarithmic*. If you take a series of evenly-spaced floating point numbers - say 256 of them - starting at 0, increasing by 0.25 each time, and interpret the bit pattern as an integer, you get the following graph:

Now if we plot the result of taking `log2(x)`

of those same 256 float values, we get this curve.

Obviously the actual values on the graphs are wildly different, and the first one is much more *steppy*, but it's clear that the first is a close approximation of the second.

The first graph is what you might call a *piecewise linear approximation*, which has been scaled and shifted by a specific amount. Perhaps unsurprisingly, the amount it's scaled and shifted by is related to the structure of a float!

log_2(x) approx frac{I_x}{L} - B

$$

Here, `log2(x)`

, we get:

Again, not a perfect mapping, but a pretty good approximation! We can also sub in the floating point terms, assuming a positive sign bit and a *normal* number:

I_x = log_2(1 + m_x) + B times 2^{e_x}

$$

Forgetting the integer representation for just a second, the log of a floating point number alone would be expressed as:

log_2(1 + m_x) + e_x

$$

Since we already know that the integer conversion is a *linear approximation*, we can make this approximate equivalence:

log_2(1 + x) approx x + sigma

$$

The sigma (*linearly*.

With all of that in mind, we can focus our attention back on the thing we're (now indirectly) attempting to compute:

When work with the raw bits of a float, we are essentially operating on a logarithm of that value. Logarithms have been carefully studied for a long time, and they have many known properties and identities. For example:

log_2(x^y) = y times log_2(x)

$$

We can also note that:

sqrt{x} = x^{0.5}

$$

Since we're asking for an answer to the question:

y = frac{1}{sqrt{x}}

$$

which we can reformulate as

y = frac{1}{x^{0.5}}

$$

and even simpler:

y = x^{-0.5}

$$

We can take at our log formulas from before, and state that:

log_2(frac{1}{sqrt{x}}) = log_2(x^{-0.5}) = -0.5 times log_2(x)

$$

Plugging in the floating point values now:

log_2(1 + m_y) + e_y approx -0.5 times log_2(1 + m_x) + e_x

$$

m_y + sigma + e_y approx -0.5 times (m_x + sigma + e_x)

$$

And then getting the floating point constants back into their integer component form:

frac{M_y}{L} + sigma + E_y - B approx -0.5 times (frac{M_x}{L} + sigma + E_x - B)

$$

We can do some algebra on this expression to turn it into one that where both sides have something that looks a raw floating point bit pattern (integer) on both sides (

frac{M_y}{L} + sigma + E_y approx -0.5 times (frac{M_x}{L} + sigma + E_x - B) + B

$$

frac{M_y}{L} + E_y approx -0.5 times (frac{M_x}{L} + sigma + E_x - B) + B - sigma

$$

frac{M_y}{L} + E_y approx -frac{1}{2}(frac{M_x}{L} + sigma + E_x - B) + B - sigma

$$

frac{M_y}{L} + E_y approx -frac{1}{2}(frac{M_x}{L} + E_x - B) + B - frac{3}{2}sigma

$$

frac{M_y}{L} + E_y approx -frac{1}{2}(frac{M_x}{L} + E_x) - frac{3}{2}(sigma - B)

$$

L(frac{M_y}{L} + E_y) approx L(-frac{1}{2}(frac{M_x}{L} + E_x) - frac{3}{2}(sigma - B))

$$

L(frac{M_y}{L} + E_y) approx -frac{1}{2}L(frac{M_x}{L} + E_x) - frac{3}{2}L(sigma - B))

$$

LE_y + M_y approx -frac{1}{2}(LE_x + M_x) - frac{3}{2}L(sigma - B))

$$

LE_y + M_y approx -frac{1}{2}(LE_x + M_x) + frac{3}{2}L(B - sigma)

$$

That is quite a mouthful - although all of the operations performed here are simple enough. With all the variable swapping done, and both sides containing groups that include proper, honest-to-goodness integer part floating point representations, we can group them back up and get:

I_y approx -frac{1}{2}I_x + frac{3}{2}L(B - sigma)

$$

This is quite a significant moment. On the left hand side, we've got the *value* *This is the famous line*:

i = 0x5f3759df - ( i >> 1 ); // what the fuck?

A bit shift to the right multiplies by `0x5f3759df`

. That hex constant is the `0x5f3759df`

come from? Assuming a sigma value

frac{3}{2}L(B - sigma) = frac{3}{2}2^{23} times 127 = 1598029824

$$

`1598029824`

in hexadecimal is `0x5f400000`

, which, as you'll note, is close to, but *not quite* the magic constant from Quake. It's off by `566817`

, and we can use this information to compute the actual sigma value used in the game:

frac{3}{2}2^{23} times 127 - frac{3}{2}2^{23}(127 - sigma) = 566817

$$

frac{3}{2}(2^{23} times 127 - 2^{23}times127 - 2^{23}(- sigma)) = 566817

$$

frac{3}{2}(-2^{23}(- sigma)) = 566817

$$

-2^{23}(- sigma) = frac{566817}{1.5}

$$

2^{23}sigma = 377878

$$

sigma = frac{377878}{2^{23}}

$$

sigma = 0.04504656

$$

That sigma value was chosen by someone to optimise the approximation, but interestingly, it isn't actually the *optimal* value (more on that later), *and* it isn't actually known who came up with it! I've left all the math in so as to remove any possibility of this being a "magic" constant; It's really anything but! In C:

int32_t compute_magic(void) { double sigma = 0.0450465; double expression = 1.5 * pow(2.0, 23.0) * (127.0 - sigma); int32_t i = expression; return i; } // -> 0x5f3759df

Note that doubles are used here not floats, and that the integer form is just a plain old cast, not an interpretation of the bit pattern.

i = 0x5f3759df - ( i >> 1 ); // what the fuck?

That single line computes an inverse square root approximation on a floating point number by realising that the raw bit pattern is an approximate log, and then exploiting identities and algebra, as well as extremely fast operations like shifting and addition.

I've often heard this algorithm referred to as a "hack". Now, I'm not one to put down a hacky solution, but a hack this is not. This is absolutely a solid, wel thought-out piece of engineering, employed to compute an expensive operation thousands of times per second on the under-powered hardware of the day.

I'll make a quick note here that this algorithm will *only* work with "normal" floating point numbers. A "sub-normal" (that is, a tiny number *very* close to zero) will fall apart, because the log approximation assumes

The approximation described above is pretty good, but definitely contains measurable error. That's where the next line comes in.

y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration

This line improves the approximation by a significant margin, by utilising an algorithm called Newtons method, or the Newton-Raphson method . This is a generic, iterative mathematical technique for finding the roots (zeros) of function. You might wonder how that could be helpful here, since we aren't looking for a zero. Well, we already have our approximation

f(y) = frac{1}{y^2} - x = 0

$$

Squaring the

Newtons method works like this: Given an initial approximation

y_{n+1} = y_n - frac{f(y_n)}{f'(y_n)}

$$

Where *derivative* of

So what is the derivative of our

f(y) = y^{-2} - x

$$

Taking the derivative for a function in this form works like:

frac{d}{dx}(x^n + c) = nx^{n-1}

$$

So we get:

frac{d}{dy} f(y) = -2y^{-3}

$$

frac{d}{dy} f(y) = -2y^{-3}

$$

frac{d}{dy} f(y) = -frac{2}{y^3}

$$

That gives us this expression for the better approximation:

y_{n+1} = y_n - frac{frac{1}{y_n^2} - x}{-frac{2}{y_n^3}}

$$

There is a problem with this form, however. A very large part of why this algorithm is fast is because it avoids floating point divisions, and the above equation has 3 of them! Fortunately, our very good friend algebra has our back again, and we can rearrange the expression into this form:

y_{n+1} = y_n(1.5 - 0.5x cdot y_n^2)

$$

No divisions, only multiplications! The exact steps to go from the first form to this one are *numerous* to say the least, but I've included it in full for completeness. Feel free to skip over it and pick back up on the code below.

y_{n+1} = y_n - frac{frac{1 - xy_n^2}{y_n^2}}{-frac{2}{y_n^3}}

$$

y_{n+1} = y_n - frac{1 - xy_n^2}{y_n^2} cdot -frac{y_n^3}{2}

$$

y_{n+1} = y_n - frac{-1 - xy_n^2}{y_n^2} cdot frac{y_n^3}{2}

$$

y_{n+1} = y_n - frac{-1 - xy_n^2}{cancel{y_n^2}} cdot frac{cancel{y_n^2}y_n}{2}

$$

y_{n+1} = y_n - (-1 - xy_n^2 cdot frac{y_n}{2})

$$

y_{n+1} = y_n - (-(1-xy_n^2) cdot frac{y_n}{2})

$$

y_{n+1} = y_n - (-1cdot(1-xy_n^2) cdot frac{y_n}{2})

$$

y_{n+1} = y_n - (-1cdot1 + -1cdot(-xy_n^2) cdot frac{y_n}{2})

$$

y_{n+1} = y_n - (-1frac{y_n}{2} + xy_n^2frac{y_n}{2})

$$

y_{n+1} = y_n - (-frac{y_n}{2} + frac{xy_n^2y_n}{2})

$$

y_{n+1} = y_n - (frac{-y_n+xy_n^3}{2})

$$

y_{n+1} = y_n - (frac{y_n cdot -1 +xy_n^3)}{2})

$$

y_{n+1} = y_n - (frac{y_n cdot -1 + y_n(xy_n^2)}{2})

$$

y_{n+1} = y_n - (frac{y_n(-1 + xy_n^2)}{2})

$$

y_{n+1} = y_n cdot frac{2}{2} - frac{y_n(-1 + xy_n^2)}{2}

$$

y_{n+1} = frac{2y_n}{2} - frac{y_n(-1 + xy_n^2)}{2}

$$

y_{n+1} = frac{2y_n - y_n(-1 + xy_n^2)}{2}

$$

y_{n+1} = frac{2y_n + y_n(-1(-1 + xy_n^2))}{2}

$$

y_{n+1} = frac{y_n(2 -1(-1 + xy_n^2))}{2}

$$

y_{n+1} = frac{y_n(2 + 1 -xy_n^2)}{2}

$$

y_{n+1} = frac{y_n(3 -xy_n^2)}{2}

$$

y_{n+1} = y_n(frac{3}{2} - frac{xy_n^2}{2})

$$

y_{n+1} = y_n(frac{3}{2} - frac{x}{2} y_n^2)

$$

y_{n+1} = y_n cdot (1.5 - (0.5x cdot y_n cdot y_n))

$$

So that is the last line of the function before the return:

y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration

Amazingly, that ends up yielding a maximum absolute error of 0.175% (and often has a far lower error). Normally, Newtons method is applied iteratively to obtain closer and closer approximations, but in the case of the Quake code, only a single iteration was used. In the original source, a second iteration is present, but is commented out.

`// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed`

This algorithm is outright astonishing. It builds on a deep knowledge of the internal mathematical details of the floating point number system, understanding what runs fast and slow on a computer, some nimble algebraic gymnastics, and a centuries old root finding method discovered by none other than Issac Newton himself, and solves a problem that was a computational bottleneck at a particular period of history.

I mentioned that Carmack did not actually come up with this himself (though I wouldn't put it past him!). The truth is the exact origin is not 100% certain. There's something kind of incredible about that, too.

And believe it or not, this rabbit hole actually goes even deeper. Mathematician Chris Lomont wrote up a paper trying to find the optimal value for sigma in the log approximation step. It's definitely worth a look if this hasn't fully satisfied your curiosity about the subject.

Lastly, I recently wrote about CORDIC, an algorithm for computing sines and cosines without floats, using only addition and bit shifting. Some folks had asked in the comments about its similarity to the fast inverse square root algorithm. I replied that it wasn't that similar, *really* - being all about floating point, bit level interpretations, and root-finding.

But then I stopped to actually think about it, and while there are large differences in the details of the algorithm, there is a lot of *spirit* in common. Specifically, making clever mathematical observations, and bringing those to bear efficiently on the hardware constraints of the time.

Some people look at algorithms like CORDIC and fast inverse square root, and think them only relics of the past; A technology with no utility in the modern world. I don't think I have to tell you that I disagree with that premise.

A lot of us get into this field because, as kids, we loved to crack things open and see how they worked (even if, sometimes, we couldn't put them back together afterwards). Algorithms such as these live in that same space for me. I've tried to keep that curious spark alive, and turn it on problems and technology that aren't immediately relevant to my everyday work. And the really crazy thing is that often the underlying elements *do* help me solve real problems! Knowledge is synthesisable, who would have thought.