Home » Uncategorized » Replacing modulo

Replacing modulo

Let’s build a counter like a clock. We want to start from 0 until 23 and then back to 0.

A first try could be:

def hour():
    i = 0
    while True:
        i += 1
        yield i % 24

def main():
    for i in hour():

if __name__ == '__main__':

Suppose we have already profiled our program and this is its bottleneck. We know that modulo is an expensive operation sometimes. That is why we strive for its replacement by other more elementary operations.

If the hours were a power of 2 (like 16 or 32), we could employ a trick using binary operations which are very efficient since the all numbers are internally represented in the binary system and a computer is so “familiar” with those operations that he can execute them really fast. The following code demonstrates this trick were we suppose that the number of hours in a day is 32 (we wish!).

def hour():
    i = 0
    while True:
        i += 1
        yield i & 31

def main():
    for i in hour():

if __name__ == '__main__':

Notice that the trick does not work for values which are not powers of 2. Why does it work in these cases? All the magic is due to the & operator which denotes the and operation. The and operation takes as input two binary digits (0 or 1) and outputs a binary digit as follows: 0 & 0 = 0, 0 & 1 = 0, 1 & 0 = 0, 1 & 1 = 0. It is easy to observe that it always outputs 0 unless both its inputs are 1. There are two key observations here.

  1. If an input is always 1, then and outputs the contents of its other input.
  2. If an input is always 0, then and outputs always 0.

In the decimal system, every digit has a weight which is a power of 10. For instance the number 1234 = 1 * 10^3 + 2 * 10^2 + 3 * 10^1 + 4 * 10^0. Similarly, in the binary system every digit has a weight which is a power of 2. 0101 = 0 * 2^3 + 1 * 2^2 + 0 * 2 ^1 + 0 * 2^0 = 5. As we would expect, as a power 10 in the decimal system is an 1 followed by as many zeros as the exponent of that power, the same holds for the binary system. So, 2^5 = 100000 = 32. Now, substracting 1 by 32 would leave us with: 31 = 011111 = 31.

When the Python interpreter executes the statement `i & 31` imagine a procedure like the following:

  1. Find the value of i (it already is in binary form).
  2. Write the number 31 in binary form.
  3. Pad (from the left) with extra zeros if need be.
  4. Execute the and operation

Every digit of i is unaltered by this operation except the highest order digits, since the 5 least significant digits of 31 are all 1. On the other hand, the higher order digits are 0, so the generator hour() can not yield a number larger than 31. Since after 31 is the number 32 which is represented as (100000) and then it is masked (the technical term when we apply the operation and to a number with another known number) with (011111) the result is always (000000). So, the counter start over again.

But what if we would like a more general approach even for numbers which are not a power of 2? Then, we could use a more general approach like:

from math import log
from math import ceil

def almostmodulo(x, y):
    higher_power = 2 ** ceil(log(y, 2)) - 1
    remains = higher_power - y + 1
    if ((x + remains) & higher_power == 0):
        return 0
        return x + 1

The operation of finding the higher power of 2 is relatively simply using a binary operation called shift and is really needed only once throughout the execution of the program. We see that we use the same trick, but use the variable remains to cover the difference with the higher power of two. The catch here is that the control flow costs too much and that is why we do not see this code in practice. We could have removed it if we knew the variable remains contains the number 1 (find out how. ) but then we fall back to the previous case! Be aware that the last code does not work when y is a power of 2, ie, in the cases that the second one works!

So, what is the meaning of this post? Which technique should we use to write super fast code? Well, the answer is trivial. Use your profiler firstly and then decide what you should optimize. Then, if it comes down to such a detail, you should probably consider writing the whole thing in another language like C. The purpose of this post was to demonstrate a few tricks and to give a clue how could the internal representation of a program and its data which are depended on the computer architecture could affect the decision of a programmer. Even though most likely the aforementioned programmer would not be one who uses a high level language, but someone who writes compilers.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: