Abstract
Because of the computational power of today’s GPUs, they are starting to be harnessed more and more to help out CPUs on high-performance computing. In addition, an increasing number of today’s state-of-the-art supercomputers include commodity GPUs to bring us unprecedented levels of performance in terms of raw GFLOPS and GFLOPS/cost. In this work, we present a GPU implementation of an image processing application of growing popularity: The 2D fast wavelet transform (2D-FWT). Based on a pair of Quadrature Mirror Filters, a complete set of application-specific optimizations are developed from a CUDA perspective to achieve outstanding factor gains over a highly optimized version of 2D-FWT run in the CPU. An alternative approach based on the Lifting Scheme is also described in Franco et al. (Acceleration of the 2D wavelet transform for CUDA-enabled Devices, 2010). Then, we investigate hardware improvements like multicores on the CPU side, and exploit them at thread-level parallelism using the OpenMP API and pthreads . Overall, the GPU exhibits better scalability and parallel performance on large-scale images to become a solid alternative for computing the 2D-FWT versus those thread-level methods run on emerging multicore architectures.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Owens, J.D., Luebke, D., Govindaraju, N., Harris, M., Kruger, J., Lefohn, A.E., Purcell, T.J.: A survey of general-purpose computation on graphics hardware. J. Comput. Graph. Forum 26, 21–51 (2007)
Mallat, S.: A theory for multiresolution signal descomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989)
Bernabé, G., González, J., García, J.M., Duato, J.: A new lossy 3-D wavelet transform for high-quality compression of medical video. In: IEEE EMBS International Conference on Information Technology Applications in Biomedicine (2000)
Daubechies, I.: Ten lectures on wavelets. Soc. Ind. Appl. Math. (1992)
Tenllado, C., Setoain, J., Prieto, M., Nuel, L.P., Tirado, F.: Parallel implementation of the 2D discrete wavelet transform on graphics processing units: filter bank versus lifting. IEEE Trans. Parallel Distrib. Syst. 19(2), 299–310 (2008)
Meerwald, P., Norcen, R., Uhl, A.: Cache issues with JPEG2000 wavelet lifting. In: VCIP, vol. 4671, pp. 626–634 (2002)
Tao, J., Shahbahrami, A., Juurlink, B., Buchty, R., Karl, W., Vassiliadis, S.: Optimizing cache performance of the discrete wavelet transform using a visualization tool. In: 9th IEEE International Symposium on Multimedia, pp. 153–160 (2007)
Shahbahrami, A., Juurlink, B., Vassiliadis, S.: Improving the memory behavior of vertical filtering in the discrete wavelet transform. In: Conference on Computing Frontiers. ACM, pp. 253–260 (2006)
Kirk, D., Hwu, W.: Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, Menlo Park. ISBN: 978-0-12-381472-2 (2010)
Intel C++ Compiler Options (Document Number: 307776-002US) (2007)
GNU compiler collection GCC http://gcc.gnu.org (2010)
OpenMP The OpenMP API. http://www.openmp.org (2010)
Moreland, K., Angel, E.: The FFT on a GPU. In: SIGGRAPH Eurographics 6th Workshop on Computer Graphics Hardware, San Diego, (California, US), 26-27 July, pp. 112–119 (2003)
NVIDIA Corporation NVIDIA CUDA CUFFT Library Version 1.1 (2007)
Govindaraju, N., Lloyd, B., Dotsenko, Y., Smith, B., Manferdelli, J.: High performance discrete fourier transforms on graphics processors. In: Proceedings Supercomputing 2008, Austin, TX (USA) (2008)
Nukada, A., Yasuhiko, O., Endo, T., Matsuoka, S.: Bandwidth intensive 3d fft kernel for gpus using cuda. In: Proceedings Supercomputing 2008, Austin, TX (USA) (2008)
Wong, T.T., Leung, C.S., Heng, P.A., Wang, J.: Discrete wavelet transform on consumer-level graphics hardware. IEEE Trans. Multimedia 9(3), 668–673 (2007)
Franco, J., Bernabe, G., Fernandez, J., Acacio, M.E., Ujaldon, M.: Acceleration of the 2D wavelet transform for CUDA-enabled devices. In: 10th PARA’2010: State of the Art in Scientific and Parallel Computing. Minisymposium on GPU Computing. Reykjavik (Iceland), June (2010)
Franco, J., Bernabe, G., Fernandez, J., Ujaldon, M.: Parallel 3D wavelet transform on multicore CPUs and Manycore GPUs. In: 10th International Conference on Computational Science. 2nd Workshop on Emerging Parallel Architectures. Amsterdam (The Netherlands), May (2010)
Sumanaweera, T., Liu, D.: Medical image reconstruction with the FFT. In: Matt Pharr (ed.) GPU Gems 2, pp. 765–784. Addison-Wesley, Reading (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
This work has been supported by the Spanish MEC and EU FEDER funds under grants “Consolider Ingenio-2010 CSD2006-00046” and “TIN2006-15516-C04-03”.
Rights and permissions
About this article
Cite this article
Franco, J., Bernabé, G., Fernández, J. et al. The 2D wavelet transform on emerging architectures: GPUs and multicores. J Real-Time Image Proc 7, 145–152 (2012). https://doi.org/10.1007/s11554-011-0224-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-011-0224-7