Thursday, April 10, 2008

Sorting a Big File

You have a file that contains at most 10^7 positive integers, all distinct, all between 0 and 10^7. You want to sort these numbers but only have about 2 megabytes of RAMs. How do you do it?

Before you read the answer here is a tip: the best answer sorts the file in linear time reading every number into memory only once.

Give up? Since the file contains all distinct integers between 0 and 10,000,000, we can use bits to represent whether every number in this range exists. For this we need only 10^7 bits or 1,250,000 bytes (1.25MB) so it can fit in memory. Here is some pseudocode:


boolean bit[10000000];
//An array of bits. I'm assuming a boolean array of size 8
// uses only one byte.
//Set it to 0
for (int i = i; i < n; i++){
bit[i] = 0;
}

//read every number from file
f = filestream("filename"); //pseudocode

while (int i << f){
bit[i] = 1;
};

//write it out
for (int i = i; i < n; i++){
if (bit[i] == 1)
cout << i;
};


This is a very old question. It appears in Programming Pearls by Jon Bentley, a really fun read!



No comments:

ShareThis