Structural Mapping of Global Optimization Algorithms to Graphics Processing Unit Architecture

Authors: Seliverstov E.Yu. Published: 03.07.2022
Published in issue: #2(139)/2022  
DOI: 10.18698/0236-3933-2022-2-42-59

Category: Informatics, Computer Engineering and Control | Chapter: Mathematical Modelling, Numerical Methods, and Program Complexes  
Keywords: global optimization, task mapping, task scheduling, structural mapping, parallel systems, graphics processing units


Graphics processing units (GPU) deliver a high execution efficiency for modern metaheuristic algorithms with a high computation complexity. It is crucial to have an optimal task mapping of the optimization algorithm to the parallel system architecture which strongly affects the efficiency of the optimization process. The paper proposes a novel task mapping algorithm of the parallel metaheuristic algorithm to the GPU architecture, describes problem statement for the mapping of algorithm graph model to the GPU model, and gives a formal definition of graph mapping and mapping restrictions. The algorithm graph model is a hierarchical graph model consisting of island parallel model and metaheuristic optimization algorithm model. A set of feasible mappings using mapping restrictions makes it possible to formalize GPU architecture and parallel model features. The structural mapping algorithm is based on cooperative solving of the optimization problem and the discrete optimization problem of the structural model mapping. The study outlines the parallel efficiency criteria which can be evaluated both experimentally and analytically to predict a model efficiency. The experimental section introduces the parallel optimization algorithm based on the proposed structural mapping algorithm. Experimental results for parallel efficiency comparison between parallel and sequential algorithms are presented and discussed

Please cite this article in English as:

Seliverstov E.Yu. Structural mapping of global optimization algorithms to graphics processing unit architecture. Herald of the Bauman Moscow State Technical University, Series Instrument Engineering, 2022, no. 2 (139), pp. 42--59 (in Russ.). DOI: https://doi.org/10.18698/0236-3933-2022-2-42-59


[1] Nvidia Ampere GA102 GPU architecture. nvidia.com: website. Available at: https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu -architecture-whitepaper-v2.pdf (accessed: 10.09.2021).

[2] Kwok Y.K., Ahmad I. Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Comput. Surv., 1999, vol. 31, no. 4, pp. 406--471. DOI: https://doi.org/10.1145/344588.344618

[3] Sinnen O. Task scheduling for parallel systems. New York, Wiley-Interscience, 2007.

[4] Hendrickson B., Kolda T.G. Graph partitioning models for parallel computing. Parallel Comput., 2000, vol. 26, no. 12, pp. 1519--1534. DOI: https://doi.org/10.1016/S0167-8191(00)00048-X

[5] Chen S., Eshaghian M.M., Wu Y.C. Mapping arbitrary non-uniform task graphs onto arbitrary non-uniform system graphs. Proc. Int. Conf. on Parallel Processing, 1995, vol. 2, pp. 191--195.

[6] Berman F., Snyder L. On mapping parallel algorithms into parallel architectures. J. Parallel Distrib. Comput., 1987, vol. 4, no. 5, pp. 439--458. DOI: https://doi.org/10.1016/0743-7315(87)90018-9

[7] Kirk D.B., Hwu W.W. Programming massively parallel processors. Morgan Kaufmann, Elsevier, 2013.

[8] Izzo D., Rucinski M., Ampatzis C. Parallel global optimisation metaheuristics using an asynchronous island-model. IEEE CEC’09, 2009, pp. 2301--2308. DOI: https://doi.org/10.1109/CEC.2009.4983227

[9] Lorion Y., Bogon T., Timm I.J., et al. An agent based parallel particle swarm optimization --- APPSO. IEEE Swarm Intelligence Symp., 2009, pp. 52--59. DOI: https://doi.org/10.1109/SIS.2009.4937844

[10] Seliverstov E.Y., Karpenko A.P. Hierarchical model of parallel metaheuristic optimization algorithms. Procedia Comput. Sc., 2019, vol. 150, pp. 441--449. DOI: https://doi.org/10.1016/j.procs.2019.02.075

[11] Skillicorn D.B., Talia D. Models and languages for parallel computation. ACM Comput. Surv., 1998, vol. 30, no. 2, pp. 123--169. DOI: https://doi.org/10.1145/280277.280278

[12] Seliverstov E.Yu. [Graph models of GPU]. Sistemy komp’yuternoy matematiki i ikh prilozheniya. Mater. XVIII Mezhdunar. nauch. konf. [Systems of Computer Maths and their Application. Proc. XVIII Int. Sc. Conf.], 2007, pp. 117--119 (in Russ.).

[13] Messmer B.T., Bunke H. Efficient subgraph isomorphism detection: a decomposition approach. IEEE Trans. Knowl. Eng., 2000, vol. 12, no. 2, pp. 307--323. DOI: https://doi.org/10.1109/69.842269

[14] Griva I., Nash S., Sofer A. Linear and nonlinear optimization. Philadelphia, SIAM, 2008.

[15] Li X., Tang K., Omidvar M.N., et. al. Benchmark functions for the CEC’2013 special session and competition on large-scale global optimization. Available at: http://www.tflsgo.org/assets/cec2018/cec2013-lsgo-benchmark-tech-report.pdf (accessed: 10.09.2021).

[16] Kennedy J., Eberhart R. Swarm intelligence. Morgan Kaufmann, Elsevier, 2001.

[17] Parhami B. Introduction to parallel processing: algorithms and architectures. Nature Switzerland AG, Springer, 1999.