iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1007/978-3-319-64471-4_10
A Case for Term Weighting Using a Dictionary on GPUs | SpringerLink
Skip to main content

A Case for Term Weighting Using a Dictionary on GPUs

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10439))

Included in the following conference series:

  • 1085 Accesses

Abstract

This paper explains the demonstration of a fast method of Okapi BM25 term weighting on graphics processing units (GPUs) for information retrieval by combining a GPU-based dictionary using a succinct data structure and data parallel primitives. The main problem with handling documents on GPUs is in processing variable length strings, such as the documents themselves and words. Processing variable sizes of data causes many idle cores, i.e., load imbalances in threads, due to the single instruction multiple data (SIMD) nature of the GPU architecture. Our term weighting method is carefully composed of efficient data parallel primitives to avoid load imbalance. Additionally, we implemented a high performance compressed dictionary on GPUs. As words are converted into identifiers (IDs) with this dictionary, costly string comparisons could be avoided. Our experimental results revealed that the proposed method of term weighting on GPUs performed up to 5\(\times \) faster than the MapReduce-based one on multi-core CPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://developer.nvidia.com/gpu-accelerated-libraries.

  2. 2.

    http://hadoop.apache.org/.

  3. 3.

    http://trec.nist.gov/.

References

  1. Baxter, S.: Moderngpu 2.0. https://github.com/moderngpu/moderngpu/

  2. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  3. Fang, W., He, B., Luo, Q., Govindaraju, N.K.: Mars: accelerating MapReduce with graphics processors. IEEE Trans. Parallel Distrib. Syst. 22(4), 608–620 (2011)

    Article  Google Scholar 

  4. Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960)

    Article  Google Scholar 

  5. Green, O., McColl, R., Bader, D.A.: GPU merge path: a GPU merging algorithm. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS 2012, pp. 331–340 (2012)

    Google Scholar 

  6. Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3. Addison Wesley, Boston (2007)

    Google Scholar 

  7. Hon, W.K., Ku, T.H., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. Theoret. Comput. Sci. 475, 113–119 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  8. Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers, San Rafael (2010)

    Google Scholar 

  9. Martínez-Prieto, M.A., Brisaboa, N., Cnovas, R., Claude, F., Navarro, G.: Practical compressed string dictionaries. Inf. Syst. 56, 73–108 (2016)

    Article  Google Scholar 

  10. Mei, X., Chu, X.: Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 28(1), 72–86 (2017)

    Article  Google Scholar 

  11. Merrill, D., Grimshaw, A.: High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process. Lett. 21(02), 245–272 (2011)

    Article  MathSciNet  Google Scholar 

  12. Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30850-5_26

    Chapter  Google Scholar 

  13. NVIDIA: CUDA toolkit documentation. http://docs.nvidia.com/cuda/

  14. Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), Article No. 43 (2007)

    Google Scholar 

  15. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the 3rd Text REtrieval Conference, pp. 109–126 (1994)

    Google Scholar 

  16. Sitaridi, E.A., Ross, K.A.: GPU-accelerated string matching for database applications. VLDB J. 25(5), 719–740 (2016)

    Article  Google Scholar 

  17. Talbot, J., Yoo, R.M., Kozyrakis, C.: Phoenix++: modular MapReduce for shared-memory systems. In: Proceedings of the Second International Workshop on MapReduce and Its Applications, MapReduce 2011, pp. 9–16 (2011)

    Google Scholar 

  18. Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, pp. 11:1–11:12 (2016)

    Google Scholar 

  19. Wong, H., Papadopoulou, M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2010, pp. 235–246 (2010)

    Google Scholar 

Download references

Acknowledgements

This work was partly supported by JSPS KAKENHI Grant Numbers 15H02701, 15K20990, 16H02908, 26540042, 26280115, 25240014 and 17K12684.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toshiaki Wakatsuki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wakatsuki, T., Keyaki, A., Miyazaki, J. (2017). A Case for Term Weighting Using a Dictionary on GPUs. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10439. Springer, Cham. https://doi.org/10.1007/978-3-319-64471-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64471-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64470-7

  • Online ISBN: 978-3-319-64471-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics