A Case for Term Weighting Using a Dictionary on GPUs

Wakatsuki, Toshiaki; Keyaki, Atsushi; Miyazaki, Jun

doi:10.1007/978-3-319-64471-4_10

Toshiaki Wakatsuki¹⁹,
Atsushi Keyaki¹⁹ &
Jun Miyazaki¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10439))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1085 Accesses

Abstract

This paper explains the demonstration of a fast method of Okapi BM25 term weighting on graphics processing units (GPUs) for information retrieval by combining a GPU-based dictionary using a succinct data structure and data parallel primitives. The main problem with handling documents on GPUs is in processing variable length strings, such as the documents themselves and words. Processing variable sizes of data causes many idle cores, i.e., load imbalances in threads, due to the single instruction multiple data (SIMD) nature of the GPU architecture. Our term weighting method is carefully composed of efficient data parallel primitives to avoid load imbalance. Additionally, we implemented a high performance compressed dictionary on GPUs. As words are converted into identifiers (IDs) with this dictionary, costly string comparisons could be avoided. Our experimental results revealed that the proposed method of term weighting on GPUs performed up to 5$\times $ faster than the MapReduce-based one on multi-core CPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

Article 22 September 2017

Fast Interactive Information Retrieval with Sampling-Based MDS on GPU Architectures

Efficient extraction of clustering-based feature signatures using GPU architectures

Article 27 June 2015

Notes

References

Baxter, S.: Moderngpu 2.0. https://github.com/moderngpu/moderngpu/
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Fang, W., He, B., Luo, Q., Govindaraju, N.K.: Mars: accelerating MapReduce with graphics processors. IEEE Trans. Parallel Distrib. Syst. 22(4), 608–620 (2011)
Article Google Scholar
Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960)
Article Google Scholar
Green, O., McColl, R., Bader, D.A.: GPU merge path: a GPU merging algorithm. In: Proceedings of the 26th ACM International Conference on Supercomputing, ICS 2012, pp. 331–340 (2012)
Google Scholar
Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3. Addison Wesley, Boston (2007)
Google Scholar
Hon, W.K., Ku, T.H., Shah, R., Thankachan, S.V., Vitter, J.S.: Faster compressed dictionary matching. Theoret. Comput. Sci. 475, 113–119 (2013)
Article MathSciNet MATH Google Scholar
Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Morgan and Claypool Publishers, San Rafael (2010)
Google Scholar
Martínez-Prieto, M.A., Brisaboa, N., Cnovas, R., Claude, F., Navarro, G.: Practical compressed string dictionaries. Inf. Syst. 56, 73–108 (2016)
Article Google Scholar
Mei, X., Chu, X.: Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 28(1), 72–86 (2017)
Article Google Scholar
Merrill, D., Grimshaw, A.: High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process. Lett. 21(02), 245–272 (2011)
Article MathSciNet Google Scholar
Navarro, G., Providel, E.: Fast, small, simple rank/select on bitmaps. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 295–306. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30850-5_26
Chapter Google Scholar
NVIDIA: CUDA toolkit documentation. http://docs.nvidia.com/cuda/
Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), Article No. 43 (2007)
Google Scholar
Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: Proceedings of the 3rd Text REtrieval Conference, pp. 109–126 (1994)
Google Scholar
Sitaridi, E.A., Ross, K.A.: GPU-accelerated string matching for database applications. VLDB J. 25(5), 719–740 (2016)
Article Google Scholar
Talbot, J., Yoo, R.M., Kozyrakis, C.: Phoenix++: modular MapReduce for shared-memory systems. In: Proceedings of the Second International Workshop on MapReduce and Its Applications, MapReduce 2011, pp. 9–16 (2011)
Google Scholar
Wang, Y., Davidson, A., Pan, Y., Wu, Y., Riffel, A., Owens, J.D.: Gunrock: a high-performance graph processing library on the GPU. In: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, pp. 11:1–11:12 (2016)
Google Scholar
Wong, H., Papadopoulou, M., Sadooghi-Alvandi, M., Moshovos, A.: Demystifying GPU microarchitecture through microbenchmarking. In: IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2010, pp. 235–246 (2010)
Google Scholar

Download references

Acknowledgements

This work was partly supported by JSPS KAKENHI Grant Numbers 15H02701, 15K20990, 16H02908, 26540042, 26280115, 25240014 and 17K12684.

Author information

Authors and Affiliations

Department of Computer Science, School of Computing, Tokyo Institute of Technology, Tokyo, Japan
Toshiaki Wakatsuki, Atsushi Keyaki & Jun Miyazaki

Authors

Toshiaki Wakatsuki
View author publications
You can also search for this author in PubMed Google Scholar
Atsushi Keyaki
View author publications
You can also search for this author in PubMed Google Scholar
Jun Miyazaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toshiaki Wakatsuki .

Editor information

Editors and Affiliations

University of Lyon, Villeurbanne, France
Djamal Benslimane
University of Milan, Milan, Italy
Ernesto Damiani
University of Michigan, Dearborn, Michigan, USA
William I. Grosky
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Wright State University, Dayton, Ohio, USA
Amit Sheth
Johannes Kepler University, Linz, Austria
Roland R. Wagner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wakatsuki, T., Keyaki, A., Miyazaki, J. (2017). A Case for Term Weighting Using a Dictionary on GPUs. In: Benslimane, D., Damiani, E., Grosky, W., Hameurlain, A., Sheth, A., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2017. Lecture Notes in Computer Science(), vol 10439. Springer, Cham. https://doi.org/10.1007/978-3-319-64471-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-64471-4_10
Published: 02 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64470-7
Online ISBN: 978-3-319-64471-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Case for Term Weighting Using a Dictionary on GPUs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

Fast Interactive Information Retrieval with Sampling-Based MDS on GPU Architectures

Efficient extraction of clustering-based feature signatures using GPU architectures

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Case for Term Weighting Using a Dictionary on GPUs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents

Fast Interactive Information Retrieval with Sampling-Based MDS on GPU Architectures

Efficient extraction of clustering-based feature signatures using GPU architectures

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation