iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/s13735-017-0140-0
Balancing search space partitions by sparse coding for distributed redundant media indexing and retrieval | International Journal of Multimedia Information Retrieval Skip to main content
Log in

Balancing search space partitions by sparse coding for distributed redundant media indexing and retrieval

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Effective partitioning multimedia indexes is key for efficient kNN search. But existing algorithms are based on document similarity, without partition size or redundancy constraints. Our goal is to create an index partitioning algorithm that addresses the specific properties of a distributed system: load balancing across nodes, redundancy in node failure and efficient node usage under concurrent querying. We propose the representation of data with overcomplete codebooks. Each document is quantized into a small set of codewords and indexed on per-codeword partitions. Quantization algorithms are designed to fit data as best as possible, leading to a bias toward codewords that fit the principal directions of data in the original space. In this paper, we propose the balanced KSVD (B-KSVD) algorithm: It distributes data uniformly across codewords, according to the distribution in the original space. The comprehensive experiments focused on measuring the effectiveness of partition size balancing and retrieval quality. Results show that B-KSVD better balances partition sizes (i.e., lower SD in partition size distribution), compared to k-means and KSVD baselines. B-KSVD achieves 38% 1-recall by inspecting only 1% of the full index, distributed over 10 partitions. k-means creates partitions with higher size variation and requires either larger codebooks or the inspection of larger portions of the index to achieve similar retrieval performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://corpus-texmex.irisa.fr/.

References

  1. Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322

    Article  MATH  Google Scholar 

  2. Aly M, Munich M, Perona P (2011) Distributed Kd-trees for retrieval from very large image collections. In: Proceedings of BMVC

  3. Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. ACM Commun 51(1):117–122

    Article  Google Scholar 

  4. Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of ACM-SIAM SODA, pp 1027–1035

  5. Babenko A, Lempitsky V (2015) The inverted multi-index. IEEE Trans PAMI 37(6):1247–1260. https://doi.org/10.1109/TPAMI.2014.2361319

    Article  Google Scholar 

  6. Babenko A, Lempitsky V (2016) Efficient indexing of billion-scale datasets of deep descriptors. In: Proceedings of IEEE CVPR

  7. Batko M, Falchi F, Lucchese C, Novak D, Perego R, Rabitti F, Sedmidubsky J, Zezula P (2010) Building a web-scale image similarity search system. Multimed Tool Appl 47(3):599–629

    Article  Google Scholar 

  8. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

  9. Borges P, Mourão A, Magalhães J (2015) High-dimensional indexing by sparse approximation. In: ACM ICMR’15, ACM

  10. Büttcher S, Clarke CL, Cormack GV (2010) Information retrieval: implementing and evaluating search engines. MIT Press, Cambridge

    MATH  Google Scholar 

  11. Cherian A, Sra S, Morellas V, Papanikolopoulos N (2014) Efficient nearest neighbors via robust sparse hashing. IEEE TIP 23(8):3646–3655

    MathSciNet  MATH  Google Scholar 

  12. Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-hash and tf-idf weighting. In: Proceedings of BMVC

  13. Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of SCG, ACM, pp 253–262

  14. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  15. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of KDD, pp 226–231

  16. Grauman K, Fergus R (2013) Chap: Learning binary hash codes for large-scale image search. In: Machine learning for computer vision. Springer, Berlin, pp 49–87. https://doi.org/10.1007/978-3-642-28661-2_3

  17. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507. https://doi.org/10.1126/science.1127647

    Article  MathSciNet  MATH  Google Scholar 

  18. Hoerl AE, Kennard RW (2004) Ridge regression. Wiley, New York. https://doi.org/10.1002/0471667196.ess2280

    Book  MATH  Google Scholar 

  19. Jégou H, Douze M, Schmid C (2011) Product quantization for nearest neighbor search. IEEE Trans PAMI 33(1):117–128. https://doi.org/10.1109/TPAMI.2010.57

    Article  Google Scholar 

  20. Jégou H, Tavenard R, Douze M, Amsaleg L (2011) Searching in one billion vectors: re-rank with source coding. arXiv e-prints arXiv:1102.3828

  21. Ji R, Duan LY, Chen J, Xie L, Yao H, Gao W (2013) Learning to distribute vocabulary indexing for scalable visual search. IEEE Trans Multimed 15(1):153–166. https://doi.org/10.1109/TMM.2012.2225035

    Article  Google Scholar 

  22. Kalantidis Y, Avrithis Y (2014) Locally optimized product quantization for approximate nearest neighbor search. In: Proceedings of IEEE CVPR, pp 2329–2336. https://doi.org/10.1109/CVPR.2014.298

  23. Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of ACM STOC, STOC ’97, pp 654–663

  24. Kulkarni A, Callan J (2010) Document allocation policies for selective searching of distributed indexes. In: Proceedings of ACM CIKM, CIKM ’10, pp 449–458. https://doi.org/10.1145/1871437.1871497

  25. Lewicki MS, Sejnowski TJ (2000) Learning overcomplete representations. Neural Comput 12(2):337–365. https://doi.org/10.1162/089976600300015826

    Article  Google Scholar 

  26. Li Z, Ning H, Cao L, Zhan T, Gong Y, Huang TS (2011) Learning to search efficiently in high dimensions. In: Neural information processing systems

  27. Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  MATH  Google Scholar 

  28. Magalhaes J, Rueger S (2007) High-dimensional visual vocabularies for image retrieval. In: ACM SIGIR’07, ACM, New York, NY, USA, pp 815–816. https://doi.org/10.1145/1277741.1277923

  29. Moise D, Shestakov D, Gudmundsson G, Amsaleg L (2013) Indexing and searching 100m images with map-reduce. In: ICMR’13, pp 17–24. https://doi.org/10.1145/2461466.2461470

  30. Mourão A, Magalhães Ja (2015) Scalable multimodal search with distributed indexing by sparse hashing. In: ACM ICMR’15, ACM, New York, NY, USA, pp 283–290. https://doi.org/10.1145/2671188.2749310

  31. Muja M, Lowe DG (2014) Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans PAMI 36:2227–2240

    Article  Google Scholar 

  32. Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607–609

    Article  Google Scholar 

  33. Pati Y, Rezaiifar R, Krishnaprasad P (1993) Orthogonal Matching Pursuit : recursive function approximation with application to wavelet decomposition. In: Asilomar Conference on Signals, Systems and Computer

  34. Raginsky M, Lazebnik S (2009) Locality-sensitive binary codes from shift-invariant kernels. In: NIPS, pp 1509–1517

  35. Tavenard R, Jégou H, Amsaleg L (2011) Balancing clusters to reduce response time variability in large scale image search. In: International workshop on content-based multimedia indexing (CBMI 2011), Madrid, Spain. http://hal.inria.fr/inria-00576886,qUAERO

  36. Thaler DG, Ravishankar CV (1998) Using name-based mappings to increase hit rates. Trans Netw 6(1):1–14. https://doi.org/10.1109/90.663936

    Article  Google Scholar 

  37. Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288

    MathSciNet  MATH  Google Scholar 

  38. Torralba A, Fergus R, Freeman W (2008) 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans PAMI 30(11):1958–1970. https://doi.org/10.1109/TPAMI.2008.128

    Article  Google Scholar 

  39. Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of VLDB, pp 194–205

  40. Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. NIPS 9(1):6

    Google Scholar 

  41. Yang Z, Kamata SI, Ahrary A (2009) NIR: content based image retrieval on cloud computing. Proc ICIS 3:556–559

    Google Scholar 

  42. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work has been partially funded by the CMU Portugal research project GoLocal Ref. CMUP-ERI/TIC/0033 /2014, by the H2020 ICT project COGNITUS with the Grant Agreement No. 687605, by the project NOVA LINCS Ref. UID/CEC/04516/2013 and by the Ph.D. scholarship Grant SFRH/BD/95064/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to André Mourão.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mourão, A., Magalhães, J. Balancing search space partitions by sparse coding for distributed redundant media indexing and retrieval. Int J Multimed Info Retr 7, 57–70 (2018). https://doi.org/10.1007/s13735-017-0140-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-017-0140-0

Keywords

Navigation