Abstract
Approximate nearest neighbor search in high-dimensional metric spaces is crucial in modern data science pipelines. Efficient search algorithms often rely on partitioning the metric space. To find the approximate nearest neighbors of a query point, a candidate set is constructed based on the points that belong to the same part in the partition, and the closest points among these candidates are identified via a bruteforce search. Hyvönen et al. (JMLR, 2024) argue that viewing this problem as a multi-class labeling problem suggests that this traditional method is not optimal. Instead, they propose a “natural classifier” search strategy that incorporates the true labels of the candidate points, demonstrating faster searches and smaller candidate sets for the same accuracy for tree-based space partitioning methods. This paper explores the natural classifier and other search strategies for partitioning based on locality-sensitive hashing. We propose a new strategy that offers more precise control over the balance between performance and quality. Our analysis highlights the trade-offs between these methods, providing insights into optimizing search efficiency in various contexts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The quick-select implementation uses a partitioning scheme based on the "Dutch National Flag Problem" approach as described in [18], chosen for its robustness against repeat elements.
- 2.
Following suggestions in [25], we also ran experiments where each point used the 50-NN as labels, which did not significantly improve performance.
References
Ahle, T.D., Aumüller, M., Pagh, R.: Parameter-free locality sensitive hashing for spherical range reporting. In: SODA (2017)
Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K., Nett, M.: Estimating local intrinsic dimensionality. In: KDD (2015)
Andoni, A., Indyk, P., Laarhoven, T., Razenshteyn, I.P., Schmidt, L.: Practical and optimal LSH for angular distance. In: NIPS, pp. 1225–1233 (2015)
Aumüller, M., Bernhardsson, E., Faithfull, A.J.: Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87 (2020)
Aumüller, M., Ceccarello, M.: The role of local dimensionality measures in benchmarking nearest neighbor search. Inf. Syst. 101, 101807 (2021)
Aumüller, M., Ceccarello, M.: Recent approaches and trends in approximate nearest neighbor search, with remarks on benchmarking. IEEE Data Eng. Bull. 46(3), 89–105 (2023)
Aumüller, M., Christiani, T., Pagh, R., Vesterli, M.: PUFFINN: parameterless and universally fast finding of nearest neighbors. In: ESA (2019)
Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: WWW. ACM, pp. 651–660 (2005)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9) (sep 1975)
Beygelzimer, A., Kakade, S.M., Langford, J.: Cover trees for nearest neighbor. In: ICML 2006. ACM
Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC (2002)
Christiani, T.: A framework for similarity search with space-time tradeoffs using locality-sensitive filtering. In: SODA. SIAM, pp. 31–46 (2017)
Christiani, T.: Fast locality-sensitive hashing frameworks for approximate near neighbor search. In: SISAP (2019)
Christiani, T., Pagh, R., Thorup, M.: Confirmation sampling for exact nearest neighbor search. In: SISAP (2020)
Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: automatic query expansion with a generative feature model for object retrieval. In: ICCV (2007)
Dasgupta, S., Sinha, K.: Randomized partition trees for exact nearest neighbor search. In: COLT (2013)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SCG. SCG ’04 (2004)
Dijkstra, E.W.: A Discipline of Programming. Prentice-Hall (1976)
Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: CIKM. ACM, pp. 669–678 (2008)
Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P., Lomeli, M., Hosseini, L., Jégou, H.: The faiss library. CoRR abs/2401.08281 (2024)
Efremenko, K., Kontorovich, A., Noivirt, M.: Fast and bayes-consistent nearest neighbors. In: AISTATS. Proceedings of Machine Learning Research, vol. 108. PMLR, pp. 1276–1286 (2020)
Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: ICDM Workshops. IEEE Computer Society, pp. 468–473 (2013)
Houle, M.E., Ma, X., Oria, V., Sun, J.: Query expansion for content-based similarity search using local and global features. ACM Trans. Multim. Comput. Commun. Appl. 13(3) (2017)
Houle, M.E., Nett, M.: Rank cover trees for nearest neighbor search. In: SISAP (2013)
Hyvönen, V., Jääsaari, E., Roos, T.: A multilabel classification framework for approximate nearest neighbor search. J. Mach. Learn. Res. 25(46), 1–51 (2024)
Hyvönen, V., Jääsaari, E., Roos, T.: A multilabel classification framework for approximate nearest neighbor search. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) NeurIPS (2022)
Hyvönen, V., Pitkanen, T., Tasoulis, S., Jääsaari, E., Tuomainen, R., Wang, L., Corander, J., Roos, T.: Fast nearest neighbor search through sparse random projections and voting. In: Big Data 2016 (2016)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC. ACM, pp. 604–613 (1998)
Iwasaki, M., Miyazaki, D.: Optimization of indexing based on k-nearest neigh-bor graph for proximity search in high-dimensional data. CoRR abs/1810.07355 (2018)
Johnsen, M.H.: Helping out a neighbor: a study of partition-based algorithms for k-approximate nearest neighbor search. Master’s thesis, ITU (2024)
Li, P., Hastie, T.J., Church, K.W.: Very sparse random projections. In: KDD (2006)
Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: VLDB. ACM (2007)
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 824–836 (2020)
Manohar, M.D., Shen, Z., Blelloch, G.E., Dhulipala, L., Gu, Y., Simhadri, H.V., Sun, Y.: Parlayann: Scalable and deterministic parallel graph-based approximate nearest neighbor search algorithms. In: PPoPP. ACM, pp. 270–285 (2024)
Paulevé, L., Jégou, H., Amsaleg, L.: Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognit. Lett. 31(11) (2010)
Simhadri, H.V., Williams, G., Aumüller, M., Douze, M., Babenko, A., Baranchuk, D., Chen, Q., Hosseini, L., Krishnaswamy, R., Srinivasa, G., Subramanya, S.J., Wang, J.: Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search. In: NeurIPS (Competition and Demos) (2021)
Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proc. IEEE 100(9), 2604–2623 (2012)
Subramanya, S.J., Devvrit, Simhadri, H.V., Krishnaswamy, R., Kadekodi, R.: Rand-nsg: Fast accurate billion-point nearest neighbor search on a single node. In: NeurIPS (2019)
Sun, P., Simcha, D., Dopson, D., Guo, R., Kumar, S.: SOAR: improved indexing for approximate nearest neighbor search. In: NeurIPS (2023)
Acknowledgments
We thank the anonymous reviewers for their insightful comments and suggestions that helped us to improve this paper. This project received funding from the Innovation Fund Denmark for the project DIREC (9142-00001B).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Johnsen, M.H., Aumüller, M. (2025). An Empirical Evaluation of Search Strategies for Locality-Sensitive Hashing: Lookup, Voting, and Natural Classifier Search. In: Chávez, E., Kimia, B., Lokoč, J., Patella, M., Sedmidubsky, J. (eds) Similarity Search and Applications. SISAP 2024. Lecture Notes in Computer Science, vol 15268. Springer, Cham. https://doi.org/10.1007/978-3-031-75823-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-75823-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-75822-5
Online ISBN: 978-3-031-75823-2
eBook Packages: Computer ScienceComputer Science (R0)