Event Abstract

Comparison of parallelized gray-scale zonal operations on CPU and GPU

  • 1 Catholic University Péter Pázmány, Faculty of Information Technology, Hungary
  • 2 University of California, EECS Department, United States
  • 3 IMEC, Belgium

Parallelization of image processing algorithms can be achieved either on the Central Processing Unit (CPU) or on the Graphic Processing Unit (GPU) sides. The availability of cheap computing power together with the increase of image resolution and size makes conceiving parallel algorithms for neuroinformatic applications very attractive. Since the advent of the Compute Unified Device Architecture (CUDA) of NVidia, GPU computing grows in popularity. On the other hand, limitations of the CUDA architecture and the capabilities of hardware need to be considered upfront in software design for achieving maximal performance. The CUDA-enabled GPUs have four types of memory, notably global memory, constant memory, texture memory and shared memory, which have distinct physical properties. The advantages of GPU massively-parallel architecture are penalized by the transfer overheads between the GPU memory and the main RAM memory, which make GPU implementation of certain classes of algorithms not useful. Zonal image processing operators, such as image convolutions, erosions and dilations, have ubiquitous applications in image processing. Such operators have complexity of O(N^2) for non-separable kernels and do not depend on the output of other processed pixels, In such way these operations do not require synchronization between threads and provide ample opportunities for speedup if executed in parallel. This makes zonal operations suitable to exploit as the best case scenario of algorithmic parallelization. In this work we compare CPU and CUDA based implementations for the basic morphological operations and spatial convolution. The CPU-based parallelization was exploited using the OpenMP library in C. For completeness, a parallel Java implementation was also developed. The results indicate that the most advantageous GPU implementation is achieved by using texture memory. Our results show an advantage of GPU parallelization over sequential implementation on the CPU for both convolutions and mathematical morphology operations. The CPU tests were run on a 4 core Intel Core i7-920 CPU with 2.67 GHz clock and 4GM RAM. Using a 3x3 kernel, the speedup of CUDA on a high end NVidia GeForce GTX 470 platform was ranging from 177 to 208 times for convolution and dilation against a sequential implementation, respectively (Fig. 1). On the other hand, the CUDA speedup ranged from 18 to 36 times for the same image sizes against an optimized CPU OpenMP implementation, respectively. Finally, the CUDA-enabled morphology operations functionality was incorporated conveniently into an ImageJ plugin. It was demonstrated that the overhead of Java waz negligible, which presents a viable option for integration of GPU code into Java programs. The wide spread use ImageJ, together with the availability of GPUs makes it attractive to further exploit GPU-parallelization of image processing algorithms on this platform.

Figure 1

Acknowledgements

The work was partially supported by a grant from the Scientific Research Foundation of Flanders, FWO (grant G.0C75.13N, D. Prodanov).

Keywords: mathematical morphology, linear filters, OpenMP, CUDA, C, Java

Conference: Neuroinformatics 2013, Stockholm, Sweden, 27 Aug - 29 Aug, 2013.

Presentation Type: Poster

Topic: General neuroinformatics

Citation: Kurczina G, Salmani V and Prodanov D (2013). Comparison of parallelized gray-scale zonal operations on CPU and GPU. Front. Neuroinform. Conference Abstract: Neuroinformatics 2013. doi: 10.3389/conf.fninf.2013.09.00006

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 04 Apr 2013; Published Online: 11 Jul 2013.

* Correspondence: Dr. Dimiter Prodanov, IMEC, Leuven, 3001, Belgium, dimiterpp@gmail.com