What is a Bloom Filter?

A Bloom filter is a lightweight, probabilistic data structure used to quickly check whether an element is a member of a set. Its key characteristic is that it can definitively say if an item is not in the set, but it can only say an item is probably in the set — meaning it may sometimes produce false positives (reporting something exists when it actually doesn't).
A Bloom filter does not store the actual data itself. Instead, it keeps track of membership information by marking bits in a fixed-size bit array. This makes it extremely space-efficient, allowing it to represent millions of entries with just a few megabytes of memory. Operations are also performed in constant time (O(1)), making it very fast.
However, there are trade-offs. Traditional Bloom filters do not support removing or updating elements, and the possibility of false positives must be considered. This makes them most suitable for cases where perfect accuracy is not required, and the goal is to avoid unnecessary database queries or reduce memory usage.
In summary, Bloom filters are ideal for membership checks in large datasets where speed and memory efficiency matter more than perfect accuracy. They're commonly used in web caching, malicious URL detection, spam filters, NoSQL database indexing, and big data processing.
How Does a Bloom Filter Work?
- A Bloom filter starts with a fixed-size bit array (e.g., 10,000 bits), all set to
0
. - To add an item, multiple hash functions are applied to the item to determine positions in the bit array.
- The bits at those positions are set to
1
. - To check for membership:
- The same hash functions are used on the query item.
- If all corresponding bits are
1
, the item is probably present. - If any bit is
0
, the item is definitely not present.
Simple Example
Imagine the bit array looks like this:
Bit Set: [0, 1, 0, 1, 0, 0, 1, 0, 0, 0]
"savasayik"
→ hash1 = 1, hash2 = 3, hash3 = 6 → all are1
→ Probably present"mehmet"
→ hash1 = 2, hash2 = 4, hash3 = 7 → bit 2 =0
→ Definitely not present

Real-World Use Cases
Use Case | Description |
---|---|
Web Caching | Quickly check if a URL has already been accessed |
Email Systems | Is the sender in the spam list? |
Web Browsers | Chrome and Firefox use them to detect malicious URLs |
Databases | ClickHouse, Cassandra, HBase use Bloom filters for fast filtering |
Blockchain | Used in Ethereum light clients for fast membership checking |
Big Data | Employed in Hadoop, Spark, and Flink for efficient filtering |
Comparison to Classic Data Structures
Feature | Bloom Filter | Hash Set / Map |
---|---|---|
Memory Usage | Very Low | High |
False Positives | Yes (rare) | No |
Lookup Speed | Constant (very fast) | Fast (but costlier) |
Stores Data | No | Yes |
Supports Deletion | No (in basic form) | Yes |
Simple Bloom Filter Example in Go
package main
import (
"fmt"
"hash/fnv"
)
const size = 1000 // Size of the bit array
type BloomFilter struct {
bitset []bool
}
func NewBloomFilter() *BloomFilter {
return &BloomFilter{
bitset: make([]bool, size),
}
}
// Two simple hash functions using FNV
func hash1(s string) int {
h := fnv.New32()
h.Write([]byte(s))
return int(h.Sum32()) % size
}
func hash2(s string) int {
h := fnv.New64()
h.Write([]byte(s))
return int(h.Sum64()) % size
}
func (bf *BloomFilter) Add(item string) {
bf.bitset[hash1(item)] = true
bf.bitset[hash2(item)] = true
}
func (bf *BloomFilter) PossiblyContains(item string) bool {
return bf.bitset[hash1(item)] && bf.bitset[hash2(item)]
}
func main() {
bf := NewBloomFilter()
// Adding values
bf.Add("savasayik")
bf.Add("mustafasandal")
// Checking membership
fmt.Println("savasayik:", bf.PossiblyContains("savasayik")) // true
fmt.Println("mustafasandal:", bf.PossiblyContains("mustafasandal")) // true
fmt.Println("mehmet:", bf.PossiblyContains("mehmet")) // false (or rarely true)
}
Notes
- We use two FNV hash functions (
fnv.New32
andfnv.New64
) for simplicity. - This Bloom filter only supports
Add
andPossiblyContains
operations. - False positives are possible, but false negatives are not.
Use a real package like github.com/willf/bloom
for production use.