Aussie AI
Bloom Filters in AI
-
Last Updated 14 September, 2024
-
by David Spuler, Ph.D.
Bloom filters are a probabilistic data structure based on hashing and bitflags. Multiple hash functions are computed for each key, and this is used to set bitflags, as described in more detail below.
Like hashing, Bloom filters have been used as a data structure to speed up neural network inference. However, much of the research literature about Bloom filters is about a different topic: Weightless Neural Networks (WNNs). WNNs have a different type of neuron based on binary bits, rather than matrix multiplications. These bitflag neurons can be approximated using Bloom filters. As such, that part of the research is less relevant to optimization of the inference of recent neural networks such as Transformers, and has not been examined in detail below.
What are Bloom Filters?
Given a key, multiple hash functions are calculated for that key, and a binary flag is set in a bitflag table for each of those hash offsets. In this way, a key maps to a pattern of multiple bits.
The Bloom filter lookup for a key value works as follows: To test whether a key is found, the multiple hash functions are computed, and then the bitflag table is analyzed to see if all those bits are set. If any of the bits are missing, the key is not in the bloom filter. If all of the bits are found, the key is possibly in the bloom filter, but it may also be that other keys have coincidentally set all those bits (a "false positive"), so it is not 100% guaranteed to be present, and a second different type of backup data structure may need to be queried to confirm. Hence, the Bloom filter is a fast test to see if a key is not in a set, but a slow test if the key is found. This makes it an example of a "common case first" optimization, where fast computations may skip more involved computations.
The complexity of Bloom filters is constant, but not as fast as hashing. A hash filter uses only a single hash function, so it has O(1) lookup. However, a Bloom filter uses multiple functions, k, so it has O(k) lookup complexity.
Bloom Filters and AI Models
Some research below has used Bloom filters to speed up model inference (also true for simple hashing). Research on the use of Bloom filters with moderm types of neural networks includes:
- X Jiao, V Akhlaghi, Y Jiang, 2018, Energy-efficient neural networks using approximate computation reuse, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE) https://ieeexplore.ieee.org/document/8342202, PDF: https://www.date-conference.com/proceedings-archive/2018/pdf/0571.pdf
- Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araujo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. Franca, Mauricio Breternitz Jr., Lizy K. John, 2023, ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks, https://arxiv.org/abs/2304.10618
- D Liang, M Hashimoto, H Awano, 2021, Bloomca: A memory efficient reservoir computing hardware implementation using cellular automata and ensemble bloom filter, 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), https://ieeexplore.ieee.org/abstract/document/9474047/
- Brandon Reagen, Udit Gupta, Robert Adolf, Michael M. Mitzenmacher, Alexander M. Rush, Gu-Yeon Wei, David Brooks, 2018, Weightless: Lossy Weight Encoding For Deep Neural Network Compression, https://arxiv.org/abs/1711.04686, PDF: http://proceedings.mlr.press/v80/reagan18a/reagan18a.pdf (Uses a "Bloomier filter" as a probabilistic data structure.)
- Wikipedia, Bloom filter, https://en.wikipedia.org/wiki/Bloom_filter
- Jack W Rae, Sergey Bartunov, Timothy P Lillicrap, 10 Jun 2019, Meta-Learning Neural Bloom Filters, https://arxiv.org/abs/1906.04304
- Sourin Chakrabarti, 18 Nov 2020 (v2), Efficient image retrieval using multi neural hash codes and bloom filters, https://arxiv.org/abs/2011.03234
- David Spuler, March 2024, Chapter 18. Parallel Data Structures, Generative AI in C++: Coding Transformers and LLMs, https://www.amazon.com/dp/B0CXJKCWX9
- Atsuki Sato, Yusuke Matsui, 2023, Fast Partitioned Learned Bloom Filter, Part of Advances in Neural Information Processing Systems 36 (NeurIPS 2023), https://proceedings.neurips.cc/paper_files/paper/2023/file/7b2e844c52349134268e819a9b56b9e8-Paper-Conference.pdf https://proceedings.neurips.cc/paper_files/paper/2023/hash/7b2e844c52349134268e819a9b56b9e8-Abstract-Conference.html Code: https://github.com/atsukisato/FastPLBF
- David Spuler, March 2024, Bloom Filters, in Generative AI in C++, https://www.aussieai.com/book/ch18-bloom-filters
- A. T. L. Bacellar, Z. Susskind, M. Breternitz, L. K. John, F. M. G. França and P. M. V. Lima, "Soon Filter: Advancing Tiny Neural Architectures for High Throughput Edge Inference," 2024 International Joint Conference on Neural Networks (IJCNN), Yokohama, Japan, 2024, pp. 1-8, doi: 10.1109/IJCNN60899.2024.10650678, https://ieeexplore.ieee.org/abstract/document/10650678
Weightless Neural Networks (WNNs)
Weightless neural networks (WNNs) are a specific neural network architecture based on binary inputs and outputs. WNNs are often implemented using Bloom filters for optimization.
- L Santiago, L Verona, F Rangel, F Firmino, 2020, Weightless neural networks as memory segmented bloom filters, Neurocomputing, Volume 416, 27 November 2020, Pages 292-304, https://www.sciencedirect.com/science/article/abs/pii/S0925231220305105, PDF: https://www.researchgate.net/profile/Daniel-Menasche/publication/339016319_Weightless_Neural_Networks_as_Memory_Segmented_Bloom_Filters/links/5e38e822a6fdccd9658479a6/Weightless-Neural-Networks-as-Memory-Segmented-Bloom-Filters.pdf
- J Rae, S Bartunov, T Lillicrap, 2019, Meta-learning neural bloom filters, https://arxiv.org/abs/1906.04304, PDF: http://proceedings.mlr.press/v97/rae19a/rae19a.pdf
- M Mitzenmacher, 2019, A model for learned bloom filters and optimizing by sandwiching, Neural Information Processing Systems, 2018, https://arxiv.org/abs/1901.00902, PDF: https://proceedings.neurips.cc/paper_files/paper/2018/file/0f49c89d1e7298bb9930789c8ed59d48-Paper.pdf
- M Mitzenmacher, 2018, A model for learned bloom filters and related structures, arXiv preprint arXiv:1802.00884, https://arxiv.org/abs/1802.00884
- R Patgiri, A Biswas, S Nayak, 2023 DeepBF: Malicious URL detection using Learned Bloom Filter and Evolutionary Deep Learning, Computer Communications 2023, Elsevier https://arxiv.org/abs/2103.12544, PDF: https://www.sciencedirect.com/science/article/pii/S0140366422004832
- Z Dai, A Shrivastava, P Reviriego, 2022, Optimizing Learned Bloom Filters: How Much Should Be Learned? IEEE Embedded Systems Letters (Volume 14, Issue 3, September 2022), https://ieeexplore.ieee.org/document/9724196, PDF: https://e-archivo.uc3m.es/bitstream/handle/10016/35826/Optimizing_IEEEESL_2022_ps.pdf?sequence=1
- Z Susskind, A Arora, IDS Miranda, LAQ Villon, 2022, Weightless neural networks for efficient edge inference, Proceedings of the ACM, PACT ’22, October 10–12, 2022, Chicago,IL https://arxiv.org/abs/2203.01479, PDF: https://dl.acm.org/doi/pdf/10.1145/3559009.3569680
- Leonardo Neumann, Antonio Guimarães, Diego F. Aranha, Edson Borin, 29 Mar 2024, Homomorphic WiSARDs: Efficient Weightless Neural Network training over encrypted data, https://arxiv.org/abs/2403.20190 Code: https://github.com/YuchuanTian/DiJiang
- Zachary Susskind 2024, Weightless Neural Networks for Fast, Low-Energy Inference, Ph.D. Dissertation, The University of Texas at Austin, https://zsknd.com/dissertation_final.pdf
More AI Research
Read more about: