A few days ago, Raymond Hettinger wrote a simple Bloom filter in ActiveState recipe. This reminded of the fact that I wrote a Bloom filter long time ago with a different hash function. So I forked Hettinger's code and extended the Bloom filter to allow union and intersection operations between two Bloom filters. Such operations are very easy, and allowed if both the Bloom filters use the same hash function. It may be used in distributed systems, where two Bloom filters are independently constructed and merged eventually. I also externalized the hash function to be given as the input to the Bloom filter class. So the user can test it with different hash function, if they have a yielding function to generate hash values.

Bloom filter is a very good data structure for space-efficient storage, whereas it is not suited if you insist on having no false positive in searches. Also if you want to do other operations like sorting the dataset or deleting elements from the dataset, Bloom filter is not suited.

However with some slight modification, sacrificing some space, we can achieve a few more things. For instance, if you want to count the number of times a given key is searched, you can allocate more than a bit for every hash value and increment the positional value as a counter for every search. Least positional value (minus 1) for a given string tells how many times that string has been searched. But this operation would increase the storage requirement. My implementation does not include any such variation.

Bloom filter is a very good data structure for space-efficient storage, whereas it is not suited if you insist on having no false positive in searches. I have one of my furnace filters business. It provide good services for air filter.

ReplyDelete