iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/978-3-031-75823-2_13
An Empirical Evaluation of Search Strategies for Locality-Sensitive Hashing: Lookup, Voting, and Natural Classifier Search | SpringerLink
Skip to main content

An Empirical Evaluation of Search Strategies for Locality-Sensitive Hashing: Lookup, Voting, and Natural Classifier Search

  • Conference paper
  • First Online:
Similarity Search and Applications (SISAP 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15268))

Included in the following conference series:

  • 39 Accesses

Abstract

Approximate nearest neighbor search in high-dimensional metric spaces is crucial in modern data science pipelines. Efficient search algorithms often rely on partitioning the metric space. To find the approximate nearest neighbors of a query point, a candidate set is constructed based on the points that belong to the same part in the partition, and the closest points among these candidates are identified via a bruteforce search. Hyvönen et al. (JMLR, 2024) argue that viewing this problem as a multi-class labeling problem suggests that this traditional method is not optimal. Instead, they propose a “natural classifier” search strategy that incorporates the true labels of the candidate points, demonstrating faster searches and smaller candidate sets for the same accuracy for tree-based space partitioning methods. This paper explores the natural classifier and other search strategies for partitioning based on locality-sensitive hashing. We propose a new strategy that offers more precise control over the balance between performance and quality. Our analysis highlights the trade-offs between these methods, providing insights into optimizing search efficiency in various contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The quick-select implementation uses a partitioning scheme based on the "Dutch National Flag Problem" approach as described in [18], chosen for its robustness against repeat elements.

  2. 2.

    Following suggestions in [25], we also ran experiments where each point used the 50-NN as labels, which did not significantly improve performance.

References

  1. Ahle, T.D., Aumüller, M., Pagh, R.: Parameter-free locality sensitive hashing for spherical range reporting. In: SODA (2017)

    Google Scholar 

  2. Amsaleg, L., Chelly, O., Furon, T., Girard, S., Houle, M.E., Kawarabayashi, K., Nett, M.: Estimating local intrinsic dimensionality. In: KDD (2015)

    Google Scholar 

  3. Andoni, A., Indyk, P., Laarhoven, T., Razenshteyn, I.P., Schmidt, L.: Practical and optimal LSH for angular distance. In: NIPS, pp. 1225–1233 (2015)

    Google Scholar 

  4. Aumüller, M., Bernhardsson, E., Faithfull, A.J.: Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Inf. Syst. 87 (2020)

    Google Scholar 

  5. Aumüller, M., Ceccarello, M.: The role of local dimensionality measures in benchmarking nearest neighbor search. Inf. Syst. 101, 101807 (2021)

    Article  Google Scholar 

  6. Aumüller, M., Ceccarello, M.: Recent approaches and trends in approximate nearest neighbor search, with remarks on benchmarking. IEEE Data Eng. Bull. 46(3), 89–105 (2023)

    Google Scholar 

  7. Aumüller, M., Christiani, T., Pagh, R., Vesterli, M.: PUFFINN: parameterless and universally fast finding of nearest neighbors. In: ESA (2019)

    Google Scholar 

  8. Bawa, M., Condie, T., Ganesan, P.: LSH forest: self-tuning indexes for similarity search. In: WWW. ACM, pp. 651–660 (2005)

    Google Scholar 

  9. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9) (sep 1975)

    Google Scholar 

  10. Beygelzimer, A., Kakade, S.M., Langford, J.: Cover trees for nearest neighbor. In: ICML 2006. ACM

    Google Scholar 

  11. Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC (2002)

    Google Scholar 

  12. Christiani, T.: A framework for similarity search with space-time tradeoffs using locality-sensitive filtering. In: SODA. SIAM, pp. 31–46 (2017)

    Google Scholar 

  13. Christiani, T.: Fast locality-sensitive hashing frameworks for approximate near neighbor search. In: SISAP (2019)

    Google Scholar 

  14. Christiani, T., Pagh, R., Thorup, M.: Confirmation sampling for exact nearest neighbor search. In: SISAP (2020)

    Google Scholar 

  15. Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: automatic query expansion with a generative feature model for object retrieval. In: ICCV (2007)

    Google Scholar 

  16. Dasgupta, S., Sinha, K.: Randomized partition trees for exact nearest neighbor search. In: COLT (2013)

    Google Scholar 

  17. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SCG. SCG ’04 (2004)

    Google Scholar 

  18. Dijkstra, E.W.: A Discipline of Programming. Prentice-Hall (1976)

    Google Scholar 

  19. Dong, W., Wang, Z., Josephson, W., Charikar, M., Li, K.: Modeling LSH for performance tuning. In: CIKM. ACM, pp. 669–678 (2008)

    Google Scholar 

  20. Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P., Lomeli, M., Hosseini, L., Jégou, H.: The faiss library. CoRR abs/2401.08281 (2024)

    Google Scholar 

  21. Efremenko, K., Kontorovich, A., Noivirt, M.: Fast and bayes-consistent nearest neighbors. In: AISTATS. Proceedings of Machine Learning Research, vol. 108. PMLR, pp. 1276–1286 (2020)

    Google Scholar 

  22. Houle, M.E.: Dimensionality, discriminability, density and distance distributions. In: ICDM Workshops. IEEE Computer Society, pp. 468–473 (2013)

    Google Scholar 

  23. Houle, M.E., Ma, X., Oria, V., Sun, J.: Query expansion for content-based similarity search using local and global features. ACM Trans. Multim. Comput. Commun. Appl. 13(3) (2017)

    Google Scholar 

  24. Houle, M.E., Nett, M.: Rank cover trees for nearest neighbor search. In: SISAP (2013)

    Google Scholar 

  25. Hyvönen, V., Jääsaari, E., Roos, T.: A multilabel classification framework for approximate nearest neighbor search. J. Mach. Learn. Res. 25(46), 1–51 (2024)

    MathSciNet  Google Scholar 

  26. Hyvönen, V., Jääsaari, E., Roos, T.: A multilabel classification framework for approximate nearest neighbor search. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) NeurIPS (2022)

    Google Scholar 

  27. Hyvönen, V., Pitkanen, T., Tasoulis, S., Jääsaari, E., Tuomainen, R., Wang, L., Corander, J., Roos, T.: Fast nearest neighbor search through sparse random projections and voting. In: Big Data 2016 (2016)

    Google Scholar 

  28. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC. ACM, pp. 604–613 (1998)

    Google Scholar 

  29. Iwasaki, M., Miyazaki, D.: Optimization of indexing based on k-nearest neigh-bor graph for proximity search in high-dimensional data. CoRR abs/1810.07355 (2018)

    Google Scholar 

  30. Johnsen, M.H.: Helping out a neighbor: a study of partition-based algorithms for k-approximate nearest neighbor search. Master’s thesis, ITU (2024)

    Google Scholar 

  31. Li, P., Hastie, T.J., Church, K.W.: Very sparse random projections. In: KDD (2006)

    Google Scholar 

  32. Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: VLDB. ACM (2007)

    Google Scholar 

  33. Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. 42(4), 824–836 (2020)

    Article  Google Scholar 

  34. Manohar, M.D., Shen, Z., Blelloch, G.E., Dhulipala, L., Gu, Y., Simhadri, H.V., Sun, Y.: Parlayann: Scalable and deterministic parallel graph-based approximate nearest neighbor search algorithms. In: PPoPP. ACM, pp. 270–285 (2024)

    Google Scholar 

  35. Paulevé, L., Jégou, H., Amsaleg, L.: Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognit. Lett. 31(11) (2010)

    Google Scholar 

  36. Simhadri, H.V., Williams, G., Aumüller, M., Douze, M., Babenko, A., Baranchuk, D., Chen, Q., Hosseini, L., Krishnaswamy, R., Srinivasa, G., Subramanya, S.J., Wang, J.: Results of the NeurIPS’21 challenge on billion-scale approximate nearest neighbor search. In: NeurIPS (Competition and Demos) (2021)

    Google Scholar 

  37. Slaney, M., Lifshits, Y., He, J.: Optimal parameters for locality-sensitive hashing. Proc. IEEE 100(9), 2604–2623 (2012)

    Article  Google Scholar 

  38. Subramanya, S.J., Devvrit, Simhadri, H.V., Krishnaswamy, R., Kadekodi, R.: Rand-nsg: Fast accurate billion-point nearest neighbor search on a single node. In: NeurIPS (2019)

    Google Scholar 

  39. Sun, P., Simcha, D., Dopson, D., Guo, R., Kumar, S.: SOAR: improved indexing for approximate nearest neighbor search. In: NeurIPS (2023)

    Google Scholar 

Download references

Acknowledgments

We thank the anonymous reviewers for their insightful comments and suggestions that helped us to improve this paper. This project received funding from the Innovation Fund Denmark for the project DIREC (9142-00001B).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Aumüller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Johnsen, M.H., Aumüller, M. (2025). An Empirical Evaluation of Search Strategies for Locality-Sensitive Hashing: Lookup, Voting, and Natural Classifier Search. In: Chávez, E., Kimia, B., Lokoč, J., Patella, M., Sedmidubsky, J. (eds) Similarity Search and Applications. SISAP 2024. Lecture Notes in Computer Science, vol 15268. Springer, Cham. https://doi.org/10.1007/978-3-031-75823-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-75823-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-75822-5

  • Online ISBN: 978-3-031-75823-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics