Модель параллельной программы для оценки времени ее исполнения
DOI:
https://doi.org/10.26089/NumMet.v23r102Ключевые слова:
параллельные программы, CUDA, OpenCL, max-plus-алгебраАннотация
Рассматриваются программы, выполняемые на видеокартах общего назначения и представленные в виде “ядер”, не содержащих циклов с неопределенной продолжительностью. Такие ядра могут быть реализованы, например, с помощью технологий CUDA или OpenCL. Для оценки времени работы подобных программ предложены модели их работы: от совсем “наивной” до более реалистичных. Все они формулируются как матричные выражения в max-plus-алгебре.
Библиографические ссылки
- J. Szuppe, “Boost.Compute: A Parallel Computing Library for C++ Based on OpenCL,” in Proc. 4th Int. Workshop on OpenCL, Vienna, Austria, April 19-21, 2016 (ACM Press, New York, 2016),
doi 10.1145/2909437.2909454. - OpenCV API Reference. GPU-accelerated Computer Vision.
https://docs.opencv.org/2.4.13.7/modules/gpu/doc/gpu.html . Cited January 5, 2022. - L. B. Bosi, M. Mariotti, and A. Santocchia, “GPU Linear Algebra Extensions for GNU/Octave,” J. Phys. Conf. Ser. 368 (1) (2012),
doi 10.1088/1742-6596/368/1/012062. - FFmpeg Hardware Acceleration.
https://trac.ffmpeg.org/wiki/HWAccelIntro . Cited January 5, 2022. - M. Abadi, P. Barham, J. Chen, et al., TensorFlow: A System for Large-Scale Machine Learning. ArXiv preprint: 1605.08695 [cs.DC](Cornell Univ. Library, Ithaca, 2016).
https://arxiv.org/abs/1605.08695 . Cited January 5, 2022. - R. Vuduc, A. Chandramowlishwaran, J. Choi, et al., “On the Limits of GPU Acceleration,” in Proc. 2nd USENIX Conf. on Hot Topics in Parallelism, Berkeley, USA, June 14-15, 2010 (USENIX Association, Berkeley, 2010),
doi 10.5555/1863086.1863099. - CUDA C++ Programming Guide PG-02829-001_v11.5.
https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf . Cited January 5, 2022. - The OpenCL Specification. Khronos OpenCL Working Group. Version V3.0.10, 19 Nov 2021.
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_API.html . Cited January 5, 2022. - A. A. Kleymenov and N. N. Popova, “A Method for Prediction Dynamic Characteristics of Parallel Programs Based on Static Analysis,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf., 10 (1), 20-31 (2021).
doi 10.14529/cmse210102. - A. A. Kleimenov and N. N. Popova, “A Method for Prediction Execution Time of GPU Programs,” Comp. Nanotechnol. 8 (1), 38-45 (2021).
doi 10.33693/2313-223X-2021-8-1-38-45. - K. Kothapalli, R. Mukherjee, M. S. Rehman, et al., “A Performance Prediction Model for the CUDA GPGPU Platform,” in Proc. 2009 Int. Conf. on High Performance Computing, Kochi, India, December 16-19, 2009 (IEEE Press, New York, 2009), pp. 463-472,
doi 10.1109/HIPC.2009.5433179. - L. G. Valiant, “A Bridging Model for Parallel Computation,” Commun. ACM 33 (8), 103-111 (1990).
doi 10.1145/79173.79181. - S. Fortune and J. Wyllie, “Parallelism in Random Access Machines,” in Proc. 10th ACM Symposium on Theory of Computing, San Diego, USA, May 1-3, 1978 (ACM Press, New York, 1978), pp. 114-118.
doi 10.1145/800133.804339. - P. B. Gibbons, Y. Matias, and V. Ramachandran, “The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms,” SIAM J. Comput. 28 (2), 733-769 (1998).
- S. S. Baghsorkhi, M. Delahaye, S. J. Patel, et al., “An Adaptive Performance Modeling Tool for GPU Architectures,” ACM SIGPLAN Not. 45 (5), 105-114 (2010).
doi 10.1145/1837853.1693470. - S. Hong and H. Kim, “An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness,” ACM SIGARCH Comput. Archit. News 37 (3), 152-163 (2009).
doi 10.1145/1555754.1555775. - Y. Zhang and J. D. Owens, “A Quantitative Performance Analysis Model for GPU Architectures,” in Proc. IEEE 17th Int. Symposium on High Performance Computer Architecture, San Antonio, USA, February 12-16, 2011 (IEEE Press, New York, 2011), pp. 382-393,
doi 10.1109/HPCA.2011.5749745. - J. Lai and A. Seznec, “Break Down GPU Execution Time with an Analytical Method,” in Proc. 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, Paris, France, January 23, 2012 (ACM Press, New York, 2012), pp. 33-39,
doi 10.1145/2162131.2162136. - J. Sim, A. Dasgupta, H. Kim, and R. Vuduc, “A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications,” ACM SIGPLAN Not. 47 (8), 11-22 (2012).
doi 10.1145/2145816.2145819. - J.-C. Huang, J. H. Lee, H. Kim, and H.-H. S. Lee, “GPUMech: GPU Performance Modeling Technique Based on Interval Analysis,” in Proc. 47th Annual IEEE/ACM Int. Symposium on Microarchitecture, Cambridge, United Kingdom, December 13-17, 2014 (IEEE Press, Washington, DC, 2014), pp. 268-279,
doi 10.1109/MICRO.2014.59. - M. Amaris, D. Cordeiro, A. Goldman, and R. Y. De Camargo, “A Simple BSP-based Model to Predict Execution Time in GPU Applications,” in Proc. IEEE 22nd Int. Conf. on High Performance Computing, Bengaluru, India, December 16-19, 2015 (IEEE Press, Washington, DC, 2015), pp. 285-294,
doi 10.1109/HiPC.2015.34. - T. C. Carroll and P. W. H. Wong, “An Improved Abstract GPU Model with Data Transfer,” in Proc. 46th Int. Conf. on Parallel Processing Workshops, Bristol, United Kingdom, August 14-17, 2017 (IEEE Press, New York, 2017), pp. 113-120,
doi 10.1109/ICPPW.2017.28. - G. Alavani, K. Varma, and S. Sarkar, “Predicting Execution Time of CUDA Kernel Using Static Analysis,” in IEEE Int. Conf. on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), Melbourne, Australia, December 11-13, 2018 (IEEE Press, New York, 2018), pp. 948-955,
doi 10.1109/BDCloud.2018.00139. - Q. Wang and X. Chu, GPGPU Performance Estimation with Core and Memory Frequency Scaling , ArXiv preprint: 1701.05308v2 [cs.PF] (Cornell Univ. Library, Ithaca, 2018).
https://arxiv.org/abs/1701.05308 . Cited January 6, 2022. - S. Salaria, A. Drozd, A. Podobas, and S. Matsuoka, “Learning Neural Representations for Predicting GPU Performance,” in Lecture Notes in Computer Science (Springer, Cham, 2019), Vol. 11501, pp. 40-58.
doi 10.1007/978-3-030-20656-7_3. - L. Braun, S. Nikas, C. Song., et al., A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels ArXiv preprint: 2001.07104v3 [cs.DC] (Cornell Univ. Library, Ithaca, 2020).
https://arxiv.org/abs/2001.07104 . Cited January 6, 2022. - T. T. Dao, J. Kim, S. Seo, et al., “A Performance Model for GPUs with Caches,” IEEE Trans. Parallel Distrib. Syst. 26 (7), 1800-1813 (2015).
doi 10.1109/TPDS.2014.2333526. - A. Karami, F. Khunjush, and S. A. Mirsoleimani, “A Statistical Performance Analyzer Framework for OpenCL Kernels on Nvidia GPUs,” J. Supercomput. 71 (8), 2900-2921 (2015).
doi 10.1007/s11227-014-1338-z. - P. Memarzia and F. Khunjush, “An In-depth Study on the Performance Impact of CUDA, OpenCL, and PTX Code,” J. Inf. Comput. Sci. 10 (2), 124-136 (2015).
- Z. Wang, B. He, W. Zhang, and S. Jiang, “A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs,” in Proc. 2016 IEEE Int. Symposium on High Performance ComputerArchitecture, Barcelona, Spain, March 12-16, 2016 (IEEE Press, New York, 2016), pp. 114-125,
doi 10.1109/HPCA.2016.7446058. - X. Wang, K. Huang, A. Knoll, and X. Qian, “A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation,” in Proc. IEEE Int. Symposium on High Performance Computer Architecture, Washington, DC, USA, February 16-20, 2019 (IEEE Press, New York, 2019), pp. 506-518,
doi 10.1109/HPCA.2019.00062. - B. Johnston, G. Falzon, and J. Milthorpe, “OpenCL Performance Prediction Using Architecture-Independent Features,” in Proc. Int. Workshop on High Performance Computing & Simulation Orleans, France, July 16-20, 2018 (IEEE Press, New York, 2018), pp. 561-569,
doi 10.1109/HPCS.2018.00095. - J. Price, “An OpenCL Device Simulator and Debugger,”
https://github.com/jrprice/Oclgrind . Cited January 8, 2022. - H. Wong, M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos, “Demystifying GPU Microarchitecture through Microbenchmarking,” in Proc. IEEE Int. Symposium on Performance Analysis of Systems & Software, White Plains, USA, March 28-30, 2010 (IEEE Press, New York, 2010), pp. 235-246,
doi 10.1109/ISPASS.2010.5452013. - V. V. Voevodin and Vl. V. Voevodin, Parallel Computing (BHV-Petersburg, St. Petersburg, 2002) [in Russian].
- R. Cuninghame-Green, Minimax Algebra (Springer, Berlin, 1979).
doi 10.1007/978-3-642-48708-8. - N. K. Krivulin, Methods of Idempotent Algebra for Problems in Modeling and Analysis of Complex Systems (St. Petersburg Univ. Press, St. Petersburg, 2009) [in Russian].
- E. E. Tyrtyshnikov, Fundamentals of Algebra (Fizmatlit, Moscow, 2017) [in Russian].
Загрузки
Опубликован
Как цитировать
Выпуск
Раздел
Лицензия
Copyright (c) 2022 В. А. Антонюк, Н. Г. Михеев

Это произведение доступно по лицензии Creative Commons «Attribution» («Атрибуция») 4.0 Всемирная.