Monday 1 February 2016

In Delphi, is addition commutative?

Answer: no, not necessarily. Despite the existence of operator precedence, i.e. the fact that the following

  X := 3 + 4 * 5;

results in 23 and not in 35, the order of operands can still have an effect.

In my BigInteger code, I discovered an odd error, that only happened in some very rare cases, and only in PUREPASCAL code, i.e. code that did not use assembler, and only in 64 bit.

It took me several hours to find out that this was the problematic expression:

  Value := NormDividend[I + J] + NormDivisor[I] + Value shr CDivLimbBits;

CDivLimbBits is a constant with value 32, Value is an Int64, NormDividend[] and NormDivisor[] are arrays of UInt32 (Cardinal). Only in some very special circumstances, this caused an error.

What happened?

In this unit, which does lots of odd and ugly things to UInt32s and to speed things up, I turned off range and overflow checks, so it went unnoticed that NormDividend[I + J] + NormDivisor[I] caused an overflow. Since overflow and range checks were off, the 33rd bit simply got cut off.

But you might say: "Hey, the third operand is an Int64, so why were these two operands not promoted to 64 bit?" It turns out that this only happens once it is required, so what the compiler actually compiles is:

  UInt32(Intermediate) := UInt32(NormDividend[I + J]) + UInt32(NormDivisor[I]);
  Value := Int64(Intermediate) + Value shr 32;

while I expected:

  Value := Int64(NormDividend[I + J]) + Int64(NormDivisor[I]) + Value shr 32;

Now, if you rearrange the Int64 to come first, like:

  Value := Value shr 32 + NormDividend[I + J] + NormDivisor[I];

then all is well. The first operand is an Int64, so all following operands are promoted too and you really get:

  Value := Value shr 32 + Int64(NormDividend[I + J]) + Int64(NormDivisor[I]);

Note that this error did not happen in 32 bit code. There, NormDividend[] and NormDivisor[] are arrays of UInt16, and Value is an Int32. In other words, in 32 bit code (and even in 64 bit code on Windows), everything seems to be promoted to Int32 (signed 32 bit integer) anyway, probably because that type is somehow the preferred type for integer expressions (most integer code uses 32 bit registers, in 32 bit as well as in 64 bit).

So take care to either cast to the required type, or to put the largest operand first, otherwise you might be in for a surprise. It certainly was a surprise to me, especially because the data I had used for unit testing had not caught this.

Only the fact I wanted to improve the speed of ToString (converting the largest known prime to a string of 22 million decimal digits still takes approx. 2'30", while converting it to a string of — much more easily convertible — hex digits only takes 135 ms), and the coincidence that in one test I had to divide by exactly 10^57, made me catch this error. Note that the assembler code did not suffer from this. There I can control exactly what gets promoted and when.

This also made me aware again of the fact that testing can only show the presence of errors, and never the absence, and that it is extremely hard to find test cases that cover everything. The fact I had to divide by a number that caused the error was sheer coincidence.