Bloom filters: real-world applications

Reader9 · 2 years ago

Bloom filters: real-world applications

Reader9 · 2 years ago

Collage sounds really interesting , will check it out. Another variation on bloom filter I recently learned about is count-min-sketch. It allows for storing/incrementing a count along with each key, and can answer “probably in set with count greater than _”, “definitely not in set”.

Thanks for adding more detail on the DB use-cases!

noli · 2 years ago

Interesting. Do I understand it correctly if I say it’s a bloom filter where instead of setting a bit to 1 for each of the hashes, you increment a counter for that hash?

How do you infer the count then, take the minimum of all matching hashes? Because intuitively it seems to me like you would need a lot more space to avoid counts being too high

Reader9 · edit-2 2 years ago

This data structure uses a 2-dimensional array to store data, documented in this scala implementation: https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/CountMinSketch.scala. I’m still trying to understand it as well.

Similar to your idea, I had thought that by using k bloom filters, each with their own hash function and bit array, one could store an approximate count up to k for each key, which also might be wasteful or a naïve solution.

PDF link: http://www.eecs.harvard.edu/~michaelm/CS222/countmin.pdf