A Bloom filter is a space-efficient, probabilistic data structure that you can use to test whether an element is a member of a set. With this type of structure, you can have false positives. For example, the filter can indicate that an element is in the set, even though it isn't. However, you can't have false negatives. So, if you add an element to the set, then the filter must indicate that the element is in the set.
The filter achieves this by using multiple hash functions to map an element to several bits in a fixed-size bit array. To control the probability of false positives, you can adjust the number of bits in the array and the number of hash functions used.
Use cases
This section describes the following use cases for using Bloom filters:
Deduplicate advertisements and events: If you have an ecommerce site, streaming service, advertising network, or marketing platform, then Bloom filters can help you determine whether a user saw an advertisement, received a promotional email or notification, or purchased a product.
You can use a Bloom filter to store all of the products that a user purchases.
- If a product isn't in the filter, then show the ad to the user and add the product to the filter.
- If the product is in the filter, then the user probably saw the associated advertisement and purchased the product. Therefore, find a different ad to show to the user.
Detect fraud: You can use Bloom filters to detect whether a credit card is flagged as stolen. To do this, use a filter that contains cards reported as stolen. When a card is used, check whether it appears in the filter.
- If the card isn't in the filter, then it isn't marked as stolen.
- If the card is in the filter, then you can check the card against the main database or deny the purchase.
Filter spam and harmful content: You can use Bloom filters to screen content for potential threats, harmful materials, and spam. To do this, create a filter that contains malicious URLs, spam email addresses, and spam phone numbers. When the user inputs a URL, or receives an email or text, check whether this information appears in the filter.
- If the URL, email, or text isn't in the filter, then allow the user to access the site represented by the URL, or receive the email or text.
- If the URL, email, or text is in the filter, then deny the user access to the associated site or prevent the user from receiving the email or text.
Detect duplicate usernames: You can use Bloom filters to determine whether a username is new or if it already exists. To do this, use a filter to track every username that signs up for your ecommerce site or streaming service. When a new user attempts to sign up with their username, check whether the username appears in the filter.
- If the username isn't in the filter, then create the account and add the username to the filter.
- If the username is in the filter, then reject the username.
For more information about these use cases, see Common use cases for Bloom filters.
Availability
If you create a Memorystore for Valkey instance, version 8.0 and later, then version 1.0 of the Bloom data type and associated commands is available automatically. This data type is API-compatible with the Bloom filter command syntax of the following Valkey client libraries:
Bloom filter types
You can have the following types of Bloom filters:
- Scaling: This type of filter doesn't have a fixed capacity so the filter can grow. If the filter reaches its capacity and you add a new, unique item to the filter, then the filter scales out and a new subfilter is created. This subfilter has a larger capacity than the filter.
- Non-scaling: This type of filter has a fixed capacity so there's a limit to how many items you can add to the filter. If the filter reaches its capacity and you attempt to add a new, unique item to the filter, then you receive an error.
For more information about the differences between these types of Bloom filters, see Scaling and non-scaling Bloom filters.
Bloom filter properties
A Bloom filter has the following properties:
- Capacity: The number of items that a Bloom filter can hold before either the filter scales out (for a scaling filter) or it rejects adding additional items (for a non-scaling filter).
- False positive rate: The rate that controls the probability of operations for Bloom filters resulting in false positives. For example, an operation that you use to check whether an element is in the filter indicates that the element is in the filter, even though it isn't.
- Expansion: This property is associated with scaling Bloom filters. It controls the growth in overall capacity when the filter scales out because the filter reaches its capacity.
- Scaling or non-scaling: Whether a Bloom filter is a scaling or non-scaling filter.
For more information about the properties of Bloom filters, see Bloom properties.
Bloom filter objects
A Bloom filter object can consume a maximum of 128 MB of memory. To check how
much memory a Bloom filter consumes, use the BF.INFO key SIZE command, where key is the key name for the filter and
SIZE is the number of bytes that the filter consumes.
Bloom categories
To manage access to Bloom commands and data, use the @bloom
category. In addition to this category, the following categories use Bloom
commands: @read, @write, and @fast.
The following table indicates whether you can map Bloom commands to the @read,
@write, @fast, and @bloom categories.
| Bloom command | @bloom |
@read |
@write |
@fast |
|---|---|---|---|---|
BF.ADD |
Y | N | Y | Y |
BF.CARD |
Y | Y | N | Y |
BF.EXISTS |
Y | Y | N | Y |
BF.INFO |
Y | Y | N | Y |
BF.INSERT |
Y | N | Y | Y |
BF.MADD |
Y | N | Y | Y |
BF.MEXISTS |
Y | Y | N | Y |
BF.RESERVE |
Y | Y | N | Y |
Bloom commands
This section lists and describes the Bloom commands that you can use to perform Bloom operations on the Bloom data type.
| Command | Description |
|---|---|
BF.ADD |
Add a single item to a Bloom filter. If the filter doesn't exist, then the command creates it. |
BF.CARD |
Return the cardinality of a Bloom filter. |
BF.EXISTS |
Determine if a Bloom filter contains the item that you specify. |
BF.INFO |
Return usage information and properties of a Bloom filter. |
BF.INSERT |
Create a Bloom filter with 0 or more items or add items to an existing filter. |
BF.MADD |
Add one or more items to a Bloom filter. If the filter doesn't exist, then the command creates it. |
BF.MEXISTS |
Determine if the bloom filter contains one or more items. |
BF.RESERVE |
Create an empty bloom filter with the properties that you specify. |
Check Bloom filters
You can check the following information about Bloom filters:
- Memory usage: check if a filter reaches its memory usage limit. To check
the amount of memory that a filter uses, use the
BF.INFOcommand. - Capacity: check if a filter is a scaling filter. If it is, scale the filter so that it reaches its capacity, and then scale out.
For more information about checking the memory usage and capacity for Bloom filters, see Handle large Bloom filters.