Some time ago I became interested in the Sieve of Eratosthenes for finding all prime numbers below a given limit. The naive application of the algorithm is not very fast, I tried to optimize it to achieve good performances. Here are ways to make this algorithm more efficient. Source files in C or Python will be provided throughout the article.
- What is a prime number ?
- Sieve of Eratosthene.
- Optimisations.
- Performances.
- To put it in a nutshell.
What is a prime number – A prime number is an integer which admits only two divisors, 1 and itself. Which excludes all other natural numbers (the sieve of Eratosthenes is based on this definition). There are an infinity of prime numbers.
Sieve of Eratosthene - The Sieve of Eratosthenes algorithm is very simple. Consider an array containing all the integers from 2 to n (if you want to compute all the prime numbers bellow n) which we assume all primes. For each element in the array, we will eliminate its multiples. The pseudo code could be as follows:
Sieve(integer n):
array = integers from 2 to n
for each i in array
delete_multiples of i in array
end for
end Sieve
This pseudo-code can be easily implemented in Python :
def delete_multiples(i, array):
for x in array:
if x > i and x % i == 0:
array.remove(x)
def Sieve(n):
array = [i for i in range(2, n)]
for i in array:
delete_multiples(i, array)
return array
def main():
array = Sieve(100)
print array
main()
Note that the complexity of this implementation is not at all suitable for very large values of n (try with n > 10000, the result will take a long time to be calculated). Let’s see how to optimize it.
Optimisations – First point, if storing all numbers in an array and then removing them if they aren’t primes may seem more readable, it’s not a good method if you want performance. A first optimization is to declare an array of n boxes each of which can contain either 1 or 0 (true or false) depending on the primality (or non-primality) of the number corresponding to the index of the array. We will see that with this technique we can completely avoid comparisons and tests (except for the loops of course), which will greatly reduce the computation time.
Second, we note that if we want to find all the multiples of a number i in our array, it is not necessary to browse the entire table by testing each time the modulo. To do this simply, we could start from the index i and then go from i in i in the array, and we are certain to have all the multiples of i (including itself). This will greatly improve our algorithm.
Another remark that can be made is that to test the primality of a number n, it is not necessary to test the division by all numbers from 2 to n, just go to the square root of n (a number beyond cannot be a divisor).
Finally, a small improvement related to the C implementation of the algorithm, since we will have to reset our array to 1, we could immediatly remove all the even numbers (since 2 is the only one which is prime). We can therefore initialize our array with two separate loops, one that will eliminate all the even numbers, and the other that will set to 1 the odd numbers. You can the the program in C by downloading the following file (sources: www.pythux.com/exemples/erato/crible_1.c). To compile the source file, you can use either gcc or clang (or other C compiler) like this:
1 2 | <strong>clang</strong> -lm -O3 crible_1.c <strong>gcc</strong> -lm -o3 crible_1.c |
We could stop there, because performance is already very good (11 ms for a Sieve up to one million and one billion until 40-45 seconds), but we’ll see that it is still possible to reduce drastically the computation time. In fact, we know that apart from the number 2, no even number is prime. So we could find a way to store only the odd numbers in the array. We therefore divide the memory footprintby two. For a Sieve up to n = 30, we would have to store the following numbers:
1 2 | Numbers -> <strong>[</strong>1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29<strong>]</strong> Offsets -> <strong>[</strong>0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14<strong>]</strong> |
But we cannot simply use the index in the array to know the numbers involved. In fact, one can notice that to find the number associated with each cell of the array, we need to calculate: « step = index * 2 + 1″. Second, to find the multiples of a number i in the array, we need to go through an array using a step of i starting from the cell (index + i). Finally, we note that with this tweak, the algorithm is more compact and we can simply initialize the array to 1 (with the function memset from library <string> of the C language). The source code corresponding to this implementation is as follows (source: www.pythux.com/examples/erato/crible.c):
char *erato_opti(int n)
{
char *tab = malloc(n / 2);
int i = 1, j, borne = sqrt(n / 2), step;
tab = memset(tab, 1, n / 2);
for (; i <= borne; i++)
{
step = 2 * i + 1;
for (j = i + step; j < n / 2; j += step)
tab[j] = 0;
}
return (tab);
}
If we want to display the prime numbers produced by the function erato_opti we just use the trick of « number = index * 2 + 1″ and do not forget to display the number 2 (which is not in the array):
void print_erato_space(char *tab, int n)
{
int i = 1;
printf("2\n");
for (; i < n / 2; i++)
if (tab[i])
printf("%i\n", i * 2 + 1);
}
You can compile the source file with one of the following command:
1 2 | <strong>clang</strong> -lm -O3 crible.c <strong>gcc</strong> -lm -O3 crible.c |
1 |
Performances – Here is a summary of the performances obtained with the different implementations :
Value of N 1000 10.000 1.000.000 1.000.000.000 Pypy's implementation 30ms 370ms / / CPython's implementation 42ms 710ms / / Implementation in C 1ms 1ms 7ms 40s Fastest implementation 1ms 1ms 6ms 32s
To put it in a nutshell - We saw some possible optimizations of the algorithm of the Sieve of Eratosthenes. This article is far from complete, you can find other optimizations more or less effective. However, most implementations that you can find on the internet and that are more efficient than this one are less readable and more complex. If you know other optimizations, please post a comment
