| Peer-Reviewed

Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity

Received: 2 February 2016     Accepted: 6 February 2016     Published: 24 June 2016
Views:       Downloads:
Abstract

Nowadays a number of applications with high volume of calculations are constantly increasing. Central Processing Unit (CPU) consists of finite number of cores. Degree of parallelism and implementation speed are issues that data high volume on CPU is low. Using the thread concept in programming, algorithms which have the parallelism capabilities, can be executed in parallel. There are many issues which in order to solving them, finding similar items in a metric space and grouping them in these issues is necessary. Computational complexity finding nearest neighbors is a challenge for run time. To evaluate the performance of GPUs speed in searching nearest neighbors, GPGPU and CUDA are used and compared with CPU usage. In this paper parallel implementation of the algorithm on GPU with access to its shared memory, is compared with parallel implementation of the algorithm on CPU through threads. It is understood that threads use graphics card's shared memory for communications, storing temporary data and retrieving data. Therefore, the parallelism on GPU is more useful than parallelism on CPU in High-Dimensional spaces. Finally, it is discussed that GPU reduces complexity to a considerable amount and is scalable.

Published in American Journal of Software Engineering and Applications (Volume 5, Issue 3-1)

This article belongs to the Special Issue Advances in Computer Science and Information Technology in Developing Countries

DOI 10.11648/j.ajsea.s.2016050301.13
Page(s) 10-14
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2016. Published by Science Publishing Group

Keywords

Nearest Neighbor, CUDA, GPU, Shared Memory, Parallelism

References
[1] M., Guevara, G., Chris, K., Hazelwood, and K., Skadron. "Enabling task parallelism in the cuda scheduler." In Workshop on Programming Models for Emerging Architectures, 2009, pp. 69-76.
[2] V., Garcia,, E., Debreuve, F., Nielsen, and M., Barlaud,. “K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching”. In Image Processing (ICIP), 2010, September. 17th IEEE International Conference on (pp. 3757-3760).
[3] CUDA C Programming Guide: CUDA Toolkit Documentation, http://docs.nvidia.com/cuda/,cuda-c-programming-guide/index.html.
[4] S., Liang, Y., Liu, C. Wang, and L., Jian,. “Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU”. In Web Society (SWS), 2010 IEEE 2nd Symposium on (pp. 53-60). IEEE, 2010, August.
[5] Hawick, Kenneth A., Arno Leist, and Daniel P. Playne. "Parallel graph component labelling with GPUs and CUDA". Parallel Computing (2010) 36, no. 12 655-678.
[6] S., Liang, Y., Liu, C., Wang, and L., Jian, 2009, October. “A CUDA-based parallel implementation of K-nearest neighbor algorithm”. In Cyber-Enabled Distributed Computing and Knowledge Discovery, 2009. CyberC'09. International Conference on (pp. 291-296). IEEE
[7] D., Qiu, S., May, & A., Nüchter. “GPU-accelerated nearest neighbor search for 3D registration”. In Computer Vision Systems, 2009. (pp. 194-203). Springer Berlin Heidelberg.
[8] V., Garcia, E., Debreuve, & M. Barlaud, "Fast k nearest neighbor search using GPU". In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW'08. 2008, June. IEEE Computer Society Conference on (pp. 1-6). IEEE.
[9] N. Mohammadi. "Presentation of a Parallel Algorithm for Nearest Neighbor Search on GPU Using CUDA". Current Trends in Technology and Sciences (CTTS) 2015.
[10] J., Nickolls, I., Buck, M., Garland, & K.Skadron, "Scalable parallel programming with CUDA". Queue, 6(2), 2008. 40-53.
[11] Nourzad. "The introduction of context-sensitive hashing algorithm for finding nearest neighbors in high-dimensional spaces". Amirkabir University of Technology, (2010).
[12] X., Wu, V., Kumar, J. R., Quinlan, J., Ghosh, Q., Yang, H., Motoda, & D. Steinberg. "Top 10 algorithms in data mining. Knowledge and Information Systems", 2008.14(1), 1-37.
[13] S., Liang, C., Wang, Y., Liu, and L., Jian,. “CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU”. In Information, Computing and Telecommunication, 2009. YC-ICT'09. 2009, September. IEEE Youth Conference on (pp. 415-418).
[14] A. Neumann, " Parallel reduction of multidimensional arrays for supporting online analytical processing (olap) on a graphics processing unit (gpu)". The University of Western Australia, 2008.
[15] S. Ryoo, C. I Rodrigues, S. S Baghsorkhi, S. S Stone, D. B. Kirk,, & W. M. W. Hwu, "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA".. (2008, February) In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. (pp. 73-82). ACM.
Cite This Article
  • APA Style

    Neda Mohammadi. (2016). Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity. American Journal of Software Engineering and Applications, 5(3-1), 10-14. https://doi.org/10.11648/j.ajsea.s.2016050301.13

    Copy | Download

    ACS Style

    Neda Mohammadi. Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity. Am. J. Softw. Eng. Appl. 2016, 5(3-1), 10-14. doi: 10.11648/j.ajsea.s.2016050301.13

    Copy | Download

    AMA Style

    Neda Mohammadi. Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity. Am J Softw Eng Appl. 2016;5(3-1):10-14. doi: 10.11648/j.ajsea.s.2016050301.13

    Copy | Download

  • @article{10.11648/j.ajsea.s.2016050301.13,
      author = {Neda Mohammadi},
      title = {Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity},
      journal = {American Journal of Software Engineering and Applications},
      volume = {5},
      number = {3-1},
      pages = {10-14},
      doi = {10.11648/j.ajsea.s.2016050301.13},
      url = {https://doi.org/10.11648/j.ajsea.s.2016050301.13},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajsea.s.2016050301.13},
      abstract = {Nowadays a number of applications with high volume of calculations are constantly increasing. Central Processing Unit (CPU) consists of finite number of cores. Degree of parallelism and implementation speed are issues that data high volume on CPU is low. Using the thread concept in programming, algorithms which have the parallelism capabilities, can be executed in parallel. There are many issues which in order to solving them, finding similar items in a metric space and grouping them in these issues is necessary. Computational complexity finding nearest neighbors is a challenge for run time. To evaluate the performance of GPUs speed in searching nearest neighbors, GPGPU and CUDA are used and compared with CPU usage. In this paper parallel implementation of the algorithm on GPU with access to its shared memory, is compared with parallel implementation of the algorithm on CPU through threads. It is understood that threads use graphics card's shared memory for communications, storing temporary data and retrieving data. Therefore, the parallelism on GPU is more useful than parallelism on CPU in High-Dimensional spaces. Finally, it is discussed that GPU reduces complexity to a considerable amount and is scalable.},
     year = {2016}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity
    AU  - Neda Mohammadi
    Y1  - 2016/06/24
    PY  - 2016
    N1  - https://doi.org/10.11648/j.ajsea.s.2016050301.13
    DO  - 10.11648/j.ajsea.s.2016050301.13
    T2  - American Journal of Software Engineering and Applications
    JF  - American Journal of Software Engineering and Applications
    JO  - American Journal of Software Engineering and Applications
    SP  - 10
    EP  - 14
    PB  - Science Publishing Group
    SN  - 2327-249X
    UR  - https://doi.org/10.11648/j.ajsea.s.2016050301.13
    AB  - Nowadays a number of applications with high volume of calculations are constantly increasing. Central Processing Unit (CPU) consists of finite number of cores. Degree of parallelism and implementation speed are issues that data high volume on CPU is low. Using the thread concept in programming, algorithms which have the parallelism capabilities, can be executed in parallel. There are many issues which in order to solving them, finding similar items in a metric space and grouping them in these issues is necessary. Computational complexity finding nearest neighbors is a challenge for run time. To evaluate the performance of GPUs speed in searching nearest neighbors, GPGPU and CUDA are used and compared with CPU usage. In this paper parallel implementation of the algorithm on GPU with access to its shared memory, is compared with parallel implementation of the algorithm on CPU through threads. It is understood that threads use graphics card's shared memory for communications, storing temporary data and retrieving data. Therefore, the parallelism on GPU is more useful than parallelism on CPU in High-Dimensional spaces. Finally, it is discussed that GPU reduces complexity to a considerable amount and is scalable.
    VL  - 5
    IS  - 3-1
    ER  - 

    Copy | Download

Author Information
  • Department of Computer Engineering, Shiraz University of Technology, Shiraz, Iran

  • Sections