Hacker's Delight

## Hacker's Delight

2013.02.28

This article expresses the author's opinion at the time of writing. There are no guarantees of correctness, originality or current relevance. Copying the whole article is forbidden. Transcription of selected parts is allowed provided that author and this source are mentioned.

Some years ago, Guilherme (a friend from the "Conectiva Guild") sent me a book named Hacker's Delight. It is a collection of very low-level CS algorithms. Bit-fiddling, binary arithmetic, etc.

Very interesting stuff, but it is not the kind of knowledge that you apply in everyday work. Most programmers will never implement a linked list or tree, let alone deal with bits. It might be of interest for a compiler writer, perhaps.

The trick that I learnt with the book and liked most, is the speed-up of integer divisions by a constant n. If the machine word has, say, 32 bits, the basic idea is to multiply by (232/n). The result will be a double word. Then, take the rightmost 32 bits as the quotient. I found that gcc actually uses this trick.

Now I am implementing (slowly) an HP-16C emulator. This model was a "programmer's calculator", so it implemented integer arithmetic and bitwise operations.

Even though Javascript has bitwise operators, I found some difficulties while using JS native operations. HP-16C allowed for a variable word size, up to 64 bits, while JS is limited to 32-bit. The calculator supports unsigned, 1-complement and 2-complement modes — while JS is always 2-complement. The HP-16C has carry and overflow flags that are reset by arithmetic operations, like in CPUs.

Given these constrains, I needed more bitwise power than JS can offer, and the 1-complement arithmetic could be a problem if I used existing libraries like BigInteger. Besides, it would be fun to NIH and implement some bitwise routines from scratch :)

My implementation uses Javascript arrays, each bit is an array element. This forced me to implement all bitwise operations like invert, shift, basic (unsigned) sum, comparison etc. On top of that, I implemented the arithmetic. And a lot of unit tests to verify the results for every word size and sign mode.

One thing I found was, neither the book nor the Internet references were crystal clear about how binary multiplication and division are performed. Because of this, I think that it would be interesting to publicize an edited version of my implementation.

The code is edited for brevity. Very basic operations like plus, minus, bit shifts and comparison are reprsented as standard JS operators. I wish I could do the same in my implementation; it is unfortunate that JS does not allow operator overloading like e.g. Python.

```/* core of multiplication (handles positive numbers only) */
core_multiply = function (a, b)
{
var c = 0;

while (a != 0) {
if (! (a & 1)) {
// even a; transform to (a/2)*(b*2)
a >>>= 1;
b <<= 1;
} else {
// odd a
c += b;
a -= 1;
}
}

return c;
};
```

The algorithm above has roots on the "pencil-and-paper" algorithm that we learn in elementary school. The main difference is that intermediate results are "stored" on parameters themselves, while a human keeps all intermediate results on paper and sums them in the end.

A degenerated version of the algorithm can be conceived by removing the first leg of the "if". The multiplication is then implemented exactly as it is defined: a repeated summation. Of course, it is not efficient.

To see how it works (or, to see that it does work at all), let's show the equivalent algorithm using decimals:

 A B C (result) 30301 15 0 30300 15 15 (0+15) 3030 150 15 303 1500 15 302 1500 1515 (15+1500) 301 1500 3015 (1515+1500) 300 1500 4515 (3015+1500) 30 15000 4515 3 150000 4515 2 150000 154515 (4515+150000) 1 150000 304515 (154515+150000) 0 150000 454515 (304515+150000)

The binary version is simpler because digits can only be 0 or 1, and scaling up/down is achieved by fast bitshifts.

It is easier to handle signs outside this algorithm, following the familiar arithmetic rules. In my real implementation, I handle an idiossincrasy of the 1-complement format: the "negative zero" (all bits 1). In this mode, signs need to be cared about, even when multiplying 0 x 0! Similar situations arise in floating point, I think.

```/* Return sign of a value */
my_sgn = function (a)
{
/* actually implemented testing MSB bit */
return (a < 0 ? -1 : +1);
}

/* Return absolute (positive) value */
my_abs = function (a)
{
/* actually implemented using bit inversion */
return (a < 0 ? a * -1; a);
}

/* multiplication front-end with signed numbers */
multiply = function (a, b)
{
var sgn = my_sgn(a) * my_sgn(b);
var a = my_abs(a);
var b = my_abs(b);
var c = core_multiply(a, b) * sgn;

return c;
};
```

The following functions are coadjuvates to division algorithm, they are not interesting by themselves. Listing for completeness.

```/* Magnitude is the number of significant bits in a
* word. This might be very easy to implement in assembly
* since some CPUs have a "nlz" (Number of Leading Zeros)
* instruction. Result can also be understood as the
* floor of the logarithm of the argument, base 2.
*/
magnitude = function (a)
{
var m = -1;

while (a) {
++m;
a >>>= 1;
}

return m;
};

/* Returns n-th power of 2. In binary, impl. is trivial */
pow2 = function (n)
{
return 1 << n;
};
```

Division was more challenging, and again the algorithms I found were not clear about how they worked. I ended up with this one, following the same strategy of multiplication: beginning with a naïve and slow version and adding an optimization.

```/* division core. only positive numbers */
core_divide = function (a, b)
{
var bn = -b;
var c = 0;

while (a >= b && a != 0 && b != 0) {
var diff = magnitude(a) - magnitude(b) - 1;

if (diff <= 0) {
// naive division algorithm
c++;
a -= b;
} else {
// a > 2b
var p2 = pow2(diff);
c += p2;
a -= b * p2;
}
}

// returns [quotient, remainder]
return [c, a];
};
```

It is interesting to appreciate how the algorithm works, again with decimals:

 A B C 454516 15 0 304516 (454516 - 150000) 15 10000 154516 (304516 - 150000) 15 20000 4516 (154516 - 150000) 15 30000 3016 (4516 - 1500) 15 30100 1516 (3016 - 1500) 15 30200 16 (1516 - 1500) 15 30300 1 (remainder) 15 30301 (quotient)

As we did with multiplication, the signs are handled outside of the core division algorithm. Another twist of division is the sign that should be attributed to the remainder. I followed the convention of giving the remainder the same sign of the dividend.

```/* Division front-end function, handles signed values
divide = function (a, b)
{
var sgn = my_sgn(a) * my_sgn(b);
var sgna = my_sgn(a);
a = my_abs(a);
b = my_abs(b);

var res = core_divide(a, b);
var c = res[0];
// dividend became the remainder
a = res[1];

// fix quotient sign
if (sgn < 0) {
c = -c;
}

// remainder inherits the sign from dividend
if (sgna < 0) {
a = -a;
}

// returns signed [quotient, remainder]
return [c, a];
};
```

The integer square root was completely new for me. Fortunately, in this case, the algorithm is clearly specified in the book, it was just a matter of copying it.

It does not follow the pencil-and-paper approach; it is a controlled trail-and-error loop (Newton's method). The optimization is never let the estimate going up and then back down, so the halting condition is simplified.

```/* Integer square root
* Equivalent to Math.floor(Math.sqrt(value))
* Algorithm from "Hacker's Delight", p.205
*/
sqrt = function (a)
{
if (a == 0) {
// avoid the case of a = 0
return 0;
}

// first estimate is a bit above
// [a >> (significant bits / 2)]
var magn = magnitude(a);
var g0 = pow2(1 + Math.floor(magn / 2));
var g1 = g0 + divide(a, g0);

// g1 = (g0 + a / g0) / 2
g1 >>>= 1;

while (g1 < g0) {
g0 = g1;
g1 = g0 + divide(a, g0);
g1 >>>= 1;       // g1 = (g0 + a / g0) / 2
}

// FIXME return remainder
return g0;
};
```

This one is probably the most interesting to see in action. Argument is 1474357 (21 significant bits), whose integer square root is 1214.

 g0 estimate a / g0 g1 estimate 2048 (211) 719 1383 < 1383 1066 1224 < 1224 1204 1214 < 1214 1214 1214 >= 1214

As you can see, it converges very fast. If I needed to calculate square roots using pencil and paper, I'd probably use this method instead of the ugly algorithm that we baby-boomers had to learn on school, just to forget some weeks later. (I don't believe they teach it nowadays.)

As the book clearly explains, the trickiest part is to find a good initial estimation of the square root, because there is a hard requirement: it must be greater than the final result (the algorithm's halt condition depends on that). The algorithm converges faster if the first estimation is pretty close, but spending too much CPU on first estimation might not pay off.

Our first estimation is very simple. For example, if the square has 8 significant bits (i.e. a number between 128 and 255) the first estimation will be 16, because it has 5 significant bits (8 divided by 2, plus 1). The square of 16 is 256, so it is guaranteedly bigger than the actual square root, that must have 4 significant bits (the square root of 128 is 11, and the square root of 255 is 15).