iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://api.crossref.org/works/10.1145/3394116
{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,7,21]],"date-time":"2024-07-21T16:30:44Z","timestamp":1721579444167},"reference-count":28,"publisher":"Association for Computing Machinery (ACM)","issue":"3","funder":[{"DOI":"10.13039\/100013266","name":"Science and Technology Facilities Council","doi-asserted-by":"publisher","award":["ST\/R000557\/1"],"id":[{"id":"10.13039\/100013266","id-type":"DOI","asserted-by":"publisher"}]}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["ACM Trans. Archit. Code Optim."],"published-print":{"date-parts":[[2020,9,30]]},"abstract":"We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language), which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared-memory-based FFT, we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs.<\/jats:p>","DOI":"10.1145\/3394116","type":"journal-article","created":{"date-parts":[[2020,7,7]],"date-time":"2020-07-07T12:32:10Z","timestamp":1594125130000},"page":"1-20","update-policy":"http:\/\/dx.doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":7,"title":["GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory"],"prefix":"10.1145","volume":"17","author":[{"ORCID":"http:\/\/orcid.org\/0000-0003-2797-0595","authenticated-orcid":false,"given":"Karel","family":"Ad\u00e1mek","sequence":"first","affiliation":[{"name":"Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, United Kingdom"}]},{"given":"Sofia","family":"Dimoudi","sequence":"additional","affiliation":[{"name":"Centre for Advanced Instrumentation, Durham University, Durham, United Kingdom"}]},{"given":"Mike","family":"Giles","sequence":"additional","affiliation":[{"name":"Mathematical Institute, University of Oxford, Oxford, United Kingdom"}]},{"given":"Wesley","family":"Armour","sequence":"additional","affiliation":[{"name":"Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, United Kingdom"}]}],"member":"320","published-online":{"date-parts":[[2020,8,3]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of the 27th Astronomical Data Analysis Software and Systems Conference (ADASS\u201917)","author":"Ad\u00e1mek K.","unstructured":"K. Ad\u00e1mek , S. Dimoudi , M. Giles , and W. Armour . 2017. Improved acceleration of the GPU fourier domain acceleration search algorithm . In Proceedings of the 27th Astronomical Data Analysis Software and Systems Conference (ADASS\u201917) . arxiv:astro-ph.IM\/1711.10855 K. Ad\u00e1mek, S. Dimoudi, M. Giles, and W. Armour. 2017. Improved acceleration of the GPU fourier domain acceleration search algorithm. In Proceedings of the 27th Astronomical Data Analysis Software and Systems Conference (ADASS\u201917). arxiv:astro-ph.IM\/1711.10855"},{"key":"#cr-split#-e_1_2_1_2_1.1","doi-asserted-by":"crossref","unstructured":"K. Ad\u00e1mek J. Novotn\u00fd and W. Armour. 2016. A polyphase filter for many-core architectures. Astron. Comput. 16 (July 2016) 1--16. DOI:https:\/\/doi.org\/10.1016\/j.ascom.2016.03.003 arxiv:astro-ph.IM\/1511.03599 10.1016\/j.ascom.2016.03.003","DOI":"10.1016\/j.ascom.2016.03.003"},{"key":"#cr-split#-e_1_2_1_2_1.2","doi-asserted-by":"crossref","unstructured":"K. Ad\u00e1mek J. Novotn\u00fd and W. Armour. 2016. A polyphase filter for many-core architectures. Astron. Comput. 16 (July 2016) 1--16. DOI:https:\/\/doi.org\/10.1016\/j.ascom.2016.03.003 arxiv:astro-ph.IM\/1511.03599","DOI":"10.1016\/j.ascom.2016.03.003"},{"key":"e_1_2_1_3_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1967.5957"},{"key":"e_1_2_1_4_1","doi-asserted-by":"publisher","DOI":"10.3847\/1538-4365\/aabe88"},{"key":"e_1_2_1_5_1","volume-title":"Proceedings of the 25th Astronomical Data Analysis Software and Systems Conference (ADASS\u201915)","author":"Dimoudi S.","unstructured":"S. Dimoudi and W. Armour . 2015. Pulsar acceleration searches on the GPU for the square kilometre array . In Proceedings of the 25th Astronomical Data Analysis Software and Systems Conference (ADASS\u201915) . arxiv:astro-ph.IM\/1511.07343. S. Dimoudi and W. Armour. 2015. Pulsar acceleration searches on the GPU for the square kilometre array. In Proceedings of the 25th Astronomical Data Analysis Software and Systems Conference (ADASS\u201915). arxiv:astro-ph.IM\/1511.07343."},{"key":"e_1_2_1_6_1","volume-title":"Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2644--2648","author":"Dobashi T.","year":"2013","unstructured":"T. Dobashi and H. Kiya . 2013. A parallel implementation method of FFT-based full-search block matching algorithms . In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2644--2648 . DOI:https:\/\/doi.org\/10.1109\/ICASSP. 2013 .6638135 10.1109\/ICASSP.2013.6638135 T. Dobashi and H. Kiya. 2013. A parallel implementation method of FFT-based full-search block matching algorithms. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. 2644--2648. DOI:https:\/\/doi.org\/10.1109\/ICASSP.2013.6638135"},{"key":"e_1_2_1_7_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICIP.2013.6738105"},{"key":"e_1_2_1_8_1","doi-asserted-by":"publisher","DOI":"10.1145\/2400682.2400684"},{"key":"e_1_2_1_9_1","doi-asserted-by":"publisher","DOI":"10.1109\/SC.2008.5213922"},{"key":"e_1_2_1_10_1","volume-title":"Zapata","author":"Gutierrez Eladio","year":"2008","unstructured":"Eladio Gutierrez , Sergio Romero , Maria A. Trenas , and Emilio L . Zapata . 2008 . Memory Locality Exploitation Strategies for FFT on the CUDA Architecture. Springer-Verlag , Berlin, 430--443. https:\/\/doi.org\/10.1007\/978-3-540-92859-1_39 10.1007\/978-3-540-92859-1_39 Eladio Gutierrez, Sergio Romero, Maria A. Trenas, and Emilio L. Zapata. 2008. Memory Locality Exploitation Strategies for FFT on the CUDA Architecture. Springer-Verlag, Berlin, 430--443. https:\/\/doi.org\/10.1007\/978-3-540-92859-1_39"},{"key":"e_1_2_1_11_1","doi-asserted-by":"publisher","DOI":"10.1090\/S0025-5718-1965-0178586-1"},{"key":"e_1_2_1_12_1","doi-asserted-by":"publisher","DOI":"10.1109\/ACCESS.2019.2918851"},{"key":"e_1_2_1_13_1","unstructured":"A. Lavin and S. Gray. 2015. Fast algorithms for convolutional neural networks. ArXiv e-prints arxiv:1509.09308. A. Lavin and S. Gray. 2015. Fast algorithms for convolutional neural networks. ArXiv e-prints arxiv:1509.09308."},{"key":"e_1_2_1_14_1","volume-title":"Understanding Digital Signal Processing","author":"Lyons R. G.","unstructured":"R. G. Lyons . 2011. Understanding Digital Signal Processing . Prentice Hall . R. G. Lyons. 2011. Understanding Digital Signal Processing. Prentice Hall."},{"key":"e_1_2_1_15_1","doi-asserted-by":"publisher","DOI":"10.5555\/844174.844191"},{"key":"e_1_2_1_16_1","doi-asserted-by":"publisher","DOI":"10.1109\/LSP.2006.879475"},{"key":"e_1_2_1_17_1","unstructured":"NVIDIA. 2019. NVIDIA CUDA Deep Neural Network Library (cuDNN). Retrieved from https:\/\/developer.nvidia.com\/cudnn. NVIDIA. 2019. NVIDIA CUDA Deep Neural Network Library (cuDNN). Retrieved from https:\/\/developer.nvidia.com\/cudnn."},{"key":"e_1_2_1_18_1","unstructured":"NVIDIA. 2019. NVIDIA CUDA Fast Fourier Transform Library (cuFFT). Retrieved from https:\/\/developer.nvidia.com\/cufft. NVIDIA. 2019. NVIDIA CUDA Fast Fourier Transform Library (cuFFT). Retrieved from https:\/\/developer.nvidia.com\/cufft."},{"key":"e_1_2_1_19_1","volume-title":"Numerical Recipes in C: The Art of Scientific Computing","author":"Press W. H.","unstructured":"W. H. Press . 1992. Numerical Recipes in C: The Art of Scientific Computing . Cambridge University Press . W. H. Press. 1992. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press."},{"key":"#cr-split#-e_1_2_1_20_1.1","unstructured":"Andrew Richards. 2015. University of Oxford Advanced Research Computing. DOI:https:\/\/doi.org\/10.5281\/zenodo.22558 10.5281\/zenodo.22558"},{"key":"#cr-split#-e_1_2_1_20_1.2","unstructured":"Andrew Richards. 2015. University of Oxford Advanced Research Computing. DOI:https:\/\/doi.org\/10.5281\/zenodo.22558"},{"key":"e_1_2_1_21_1","volume-title":"Computational Frameworks for the Fast Fourier Transform","author":"Loan C. Van","year":"1970","unstructured":"C. Van Loan . 1992. Computational Frameworks for the Fast Fourier Transform . Society for Industrial and Applied Mathematics. Retrieved from arXiv:http:\/\/epubs.siam.org\/doi\/pdf\/10.1137\/1.978161 1970 999. C. Van Loan. 1992. Computational Frameworks for the Fast Fourier Transform. Society for Industrial and Applied Mathematics. Retrieved from arXiv:http:\/\/epubs.siam.org\/doi\/pdf\/10.1137\/1.9781611970999."},{"key":"e_1_2_1_22_1","unstructured":"N. Vasilache J. Johnson M. Mathieu S. Chintala S. Piantino and Y. LeCun. 2014. Fast convolutional nets with fbfft: A GPU performance evaluation. ArXiv e-prints arxiv:cs.LG\/1412.7580. https:\/\/research.fb.com\/wp-content\/uploads\/2016\/11\/fast-convolutional-nets-with-fbfft-a-gpu-performance-evaluation.pdf. N. Vasilache J. Johnson M. Mathieu S. Chintala S. Piantino and Y. LeCun. 2014. Fast convolutional nets with fbfft: A GPU performance evaluation. ArXiv e-prints arxiv:cs.LG\/1412.7580. https:\/\/research.fb.com\/wp-content\/uploads\/2016\/11\/fast-convolutional-nets-with-fbfft-a-gpu-performance-evaluation.pdf."},{"key":"e_1_2_1_23_1","volume-title":"Fitting FFT onto the G80 architecture","author":"Volkov Vasily","year":"2008","unstructured":"Vasily Volkov and Brian Kazian . 2008. Fitting FFT onto the G80 architecture . University of California , Berkeley ( 2008 ). https:\/\/pdfs.semanticscholar.org\/eb3a\/82ddfc4e73de18a4004ecb9c1109730ae3eb.pdf. Vasily Volkov and Brian Kazian. 2008. Fitting FFT onto the G80 architecture. University of California, Berkeley (2008). https:\/\/pdfs.semanticscholar.org\/eb3a\/82ddfc4e73de18a4004ecb9c1109730ae3eb.pdf."},{"key":"e_1_2_1_24_1","unstructured":"W. Armour K. A\u00e1mek J. Novotn\u00fd S. Dimoudi C. Carels and N. Ouannoughi. 2019. AstroAccelerate. https:\/\/github.com\/AstroAccelerateOrg\/astro-accelerate.git. W. Armour K. A\u00e1mek J. Novotn\u00fd S. Dimoudi C. Carels and N. Ouannoughi. 2019. AstroAccelerate. https:\/\/github.com\/AstroAccelerateOrg\/astro-accelerate.git."},{"key":"e_1_2_1_25_1","volume-title":"Proceedings of the International Conference on Acoustics (AIA-DAGA\u201913)","author":"Wefers Frank","year":"2013","unstructured":"Frank Wefers and Michael Vorl\u00e4nder . 2013 . Using fast convolution for FIR filtering Overview and guidelines for real-time audio rendering . In Proceedings of the International Conference on Acoustics (AIA-DAGA\u201913) . Retrieved from http:\/\/pub.dega-akustik.de\/AIA_DAGA_ 2013\/data\/articles\/000683.pdf. Frank Wefers and Michael Vorl\u00e4nder. 2013. Using fast convolution for FIR filtering Overview and guidelines for real-time audio rendering. In Proceedings of the International Conference on Acoustics (AIA-DAGA\u201913). Retrieved from http:\/\/pub.dega-akustik.de\/AIA_DAGA_2013\/data\/articles\/000683.pdf."},{"key":"e_1_2_1_26_1","volume-title":"A Highly Efficient FFT Using Shared-memory Multiplexing","author":"Yang Yi","unstructured":"Yi Yang and Huiyang Zhou . 2014. A Highly Efficient FFT Using Shared-memory Multiplexing . Springer International Publishing , Cham , 363--377. DOI:https:\/\/doi.org\/10.1007\/978-3-319-06548-9_17 10.1007\/978-3-319-06548-9_17 Yi Yang and Huiyang Zhou. 2014. A Highly Efficient FFT Using Shared-memory Multiplexing. Springer International Publishing, Cham, 363--377. DOI:https:\/\/doi.org\/10.1007\/978-3-319-06548-9_17"}],"container-title":["ACM Transactions on Architecture and Code Optimization"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3394116","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,1,1]],"date-time":"2023-01-01T09:18:11Z","timestamp":1672564691000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/3394116"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2020,8,3]]},"references-count":28,"journal-issue":{"issue":"3","published-print":{"date-parts":[[2020,9,30]]}},"alternative-id":["10.1145\/3394116"],"URL":"http:\/\/dx.doi.org\/10.1145\/3394116","relation":{},"ISSN":["1544-3566","1544-3973"],"issn-type":[{"value":"1544-3566","type":"print"},{"value":"1544-3973","type":"electronic"}],"subject":[],"published":{"date-parts":[[2020,8,3]]},"assertion":[{"value":"2019-10-01","order":0,"name":"received","label":"Received","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-04-01","order":1,"name":"accepted","label":"Accepted","group":{"name":"publication_history","label":"Publication History"}},{"value":"2020-08-03","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}