Модель параллельной программы для оценки времени ее исполнения
Авторы
-
В. А. Антонюк
-
Н. Г. Михеев
Ключевые слова:
параллельные программы
CUDA
OpenCL
max-plus-алгебра
Аннотация
Рассматриваются программы, выполняемые на видеокартах общего назначения и представленные в виде “ядер”, не содержащих циклов с неопределенной продолжительностью. Такие ядра могут быть реализованы, например, с помощью технологий CUDA или OpenCL. Для оценки времени работы подобных программ предложены модели их работы: от совсем “наивной” до более реалистичных. Все они формулируются как матричные выражения в max-plus-алгебре.
Раздел
Параллельные программные средства и технологии
Библиографические ссылки
- J. Szuppe, “Boost.Compute: A Parallel Computing Library for C++ Based on OpenCL,” in Proc. 4th Int. Workshop on OpenCL, Vienna, Austria, April 19-21, 2016 (ACM Press, New York, 2016),
doi 10.1145/2909437.2909454.
- OpenCV API Reference. GPU-accelerated Computer Vision.
https://docs.opencv.org/2.4.13.7/modules/gpu/doc/gpu.html . Cited January 5, 2022.
- L. B. Bosi, M. Mariotti, and A. Santocchia, “GPU Linear Algebra Extensions for GNU/Octave,” J. Phys. Conf. Ser. 368 (1) (2012),
doi 10.1088/1742-6596/368/1/012062.
- FFmpeg Hardware Acceleration.
https://trac.ffmpeg.org/wiki/HWAccelIntro . Cited January 5, 2022.
- M. Abadi, P. Barham, J. Chen, et al., TensorFlow: A System for Large-Scale Machine Learning. ArXiv preprint: 1605.08695 [cs.DC](Cornell Univ. Library, Ithaca, 2016).
https://arxiv.org/abs/1605.08695 . Cited January 5, 2022.
- R. Vuduc, A. Chandramowlishwaran, J. Choi, et al., “On the Limits of GPU Acceleration,” in Proc. 2nd USENIX Conf. on Hot Topics in Parallelism, Berkeley, USA, June 14-15, 2010 (USENIX Association, Berkeley, 2010),
doi 10.5555/1863086.1863099.
- CUDA C++ Programming Guide PG-02829-001_v11.5.
https://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf . Cited January 5, 2022.
- The OpenCL Specification. Khronos OpenCL Working Group. Version V3.0.10, 19 Nov 2021.
https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/OpenCL_API.html . Cited January 5, 2022.
- A. A. Kleymenov and N. N. Popova, “A Method for Prediction Dynamic Characteristics of Parallel Programs Based on Static Analysis,” Vestn. Yuzhn. Ural. Gos. Univ. Ser. Vychisl. Mat. Inf., 10 (1), 20-31 (2021).
doi 10.14529/cmse210102.
- A. A. Kleimenov and N. N. Popova, “A Method for Prediction Execution Time of GPU Programs,” Comp. Nanotechnol. 8 (1), 38-45 (2021).
doi 10.33693/2313-223X-2021-8-1-38-45.
- K. Kothapalli, R. Mukherjee, M. S. Rehman, et al., “A Performance Prediction Model for the CUDA GPGPU Platform,” in Proc. 2009 Int. Conf. on High Performance Computing, Kochi, India, December 16-19, 2009 (IEEE Press, New York, 2009), pp. 463-472,
doi 10.1109/HIPC.2009.5433179.
- L. G. Valiant, “A Bridging Model for Parallel Computation,” Commun. ACM 33 (8), 103-111 (1990).
doi 10.1145/79173.79181.
- S. Fortune and J. Wyllie, “Parallelism in Random Access Machines,” in Proc. 10th ACM Symposium on Theory of Computing, San Diego, USA, May 1-3, 1978 (ACM Press, New York, 1978), pp. 114-118.
doi 10.1145/800133.804339.
- P. B. Gibbons, Y. Matias, and V. Ramachandran, “The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms,” SIAM J. Comput. 28 (2), 733-769 (1998).
- S. S. Baghsorkhi, M. Delahaye, S. J. Patel, et al., “An Adaptive Performance Modeling Tool for GPU Architectures,” ACM SIGPLAN Not. 45 (5), 105-114 (2010).
doi 10.1145/1837853.1693470.
- S. Hong and H. Kim, “An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness,” ACM SIGARCH Comput. Archit. News 37 (3), 152-163 (2009).
doi 10.1145/1555754.1555775.
- Y. Zhang and J. D. Owens, “A Quantitative Performance Analysis Model for GPU Architectures,” in Proc. IEEE 17th Int. Symposium on High Performance Computer Architecture, San Antonio, USA, February 12-16, 2011 (IEEE Press, New York, 2011), pp. 382-393,
doi 10.1109/HPCA.2011.5749745.
- J. Lai and A. Seznec, “Break Down GPU Execution Time with an Analytical Method,” in Proc. 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, Paris, France, January 23, 2012 (ACM Press, New York, 2012), pp. 33-39,
doi 10.1145/2162131.2162136.
- J. Sim, A. Dasgupta, H. Kim, and R. Vuduc, “A Performance Analysis Framework for Identifying Potential Benefits in GPGPU Applications,” ACM SIGPLAN Not. 47 (8), 11-22 (2012).
doi 10.1145/2145816.2145819.
- J.-C. Huang, J. H. Lee, H. Kim, and H.-H. S. Lee, “GPUMech: GPU Performance Modeling Technique Based on Interval Analysis,” in Proc. 47th Annual IEEE/ACM Int. Symposium on Microarchitecture, Cambridge, United Kingdom, December 13-17, 2014 (IEEE Press, Washington, DC, 2014), pp. 268-279,
doi 10.1109/MICRO.2014.59.
- M. Amaris, D. Cordeiro, A. Goldman, and R. Y. De Camargo, “A Simple BSP-based Model to Predict Execution Time in GPU Applications,” in Proc. IEEE 22nd Int. Conf. on High Performance Computing, Bengaluru, India, December 16-19, 2015 (IEEE Press, Washington, DC, 2015), pp. 285-294,
doi 10.1109/HiPC.2015.34.
- T. C. Carroll and P. W. H. Wong, “An Improved Abstract GPU Model with Data Transfer,” in Proc. 46th Int. Conf. on Parallel Processing Workshops, Bristol, United Kingdom, August 14-17, 2017 (IEEE Press, New York, 2017), pp. 113-120,
doi 10.1109/ICPPW.2017.28.
- G. Alavani, K. Varma, and S. Sarkar, “Predicting Execution Time of CUDA Kernel Using Static Analysis,” in IEEE Int. Conf. on Parallel & Distributed Processing with Applications, Ubiquitous Computing & Communications, Big Data & Cloud Computing, Social Computing & Networking, Sustainable Computing & Communications (ISPA/IUCC/BDCloud/SocialCom/SustainCom), Melbourne, Australia, December 11-13, 2018 (IEEE Press, New York, 2018), pp. 948-955,
doi 10.1109/BDCloud.2018.00139.
- Q. Wang and X. Chu, GPGPU Performance Estimation with Core and Memory Frequency Scaling , ArXiv preprint: 1701.05308v2 [cs.PF] (Cornell Univ. Library, Ithaca, 2018).
https://arxiv.org/abs/1701.05308 . Cited January 6, 2022.
- S. Salaria, A. Drozd, A. Podobas, and S. Matsuoka, “Learning Neural Representations for Predicting GPU Performance,” in Lecture Notes in Computer Science (Springer, Cham, 2019), Vol. 11501, pp. 40-58.
doi 10.1007/978-3-030-20656-7_3.
- L. Braun, S. Nikas, C. Song., et al., A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels ArXiv preprint: 2001.07104v3 [cs.DC] (Cornell Univ. Library, Ithaca, 2020).
https://arxiv.org/abs/2001.07104 . Cited January 6, 2022.
- T. T. Dao, J. Kim, S. Seo, et al., “A Performance Model for GPUs with Caches,” IEEE Trans. Parallel Distrib. Syst. 26 (7), 1800-1813 (2015).
doi 10.1109/TPDS.2014.2333526.
- A. Karami, F. Khunjush, and S. A. Mirsoleimani, “A Statistical Performance Analyzer Framework for OpenCL Kernels on Nvidia GPUs,” J. Supercomput. 71 (8), 2900-2921 (2015).
doi 10.1007/s11227-014-1338-z.
- P. Memarzia and F. Khunjush, “An In-depth Study on the Performance Impact of CUDA, OpenCL, and PTX Code,” J. Inf. Comput. Sci. 10 (2), 124-136 (2015).
- Z. Wang, B. He, W. Zhang, and S. Jiang, “A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs,” in Proc. 2016 IEEE Int. Symposium on High Performance ComputerArchitecture, Barcelona, Spain, March 12-16, 2016 (IEEE Press, New York, 2016), pp. 114-125,
doi 10.1109/HPCA.2016.7446058.
- X. Wang, K. Huang, A. Knoll, and X. Qian, “A Hybrid Framework for Fast and Accurate GPU Performance Estimation through Source-Level Analysis and Trace-Based Simulation,” in Proc. IEEE Int. Symposium on High Performance Computer Architecture, Washington, DC, USA, February 16-20, 2019 (IEEE Press, New York, 2019), pp. 506-518,
doi 10.1109/HPCA.2019.00062.
- B. Johnston, G. Falzon, and J. Milthorpe, “OpenCL Performance Prediction Using Architecture-Independent Features,” in Proc. Int. Workshop on High Performance Computing & Simulation Orleans, France, July 16-20, 2018 (IEEE Press, New York, 2018), pp. 561-569,
doi 10.1109/HPCS.2018.00095.
- J. Price, “An OpenCL Device Simulator and Debugger,”
https://github.com/jrprice/Oclgrind . Cited January 8, 2022.
- H. Wong, M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos, “Demystifying GPU Microarchitecture through Microbenchmarking,” in Proc. IEEE Int. Symposium on Performance Analysis of Systems & Software, White Plains, USA, March 28-30, 2010 (IEEE Press, New York, 2010), pp. 235-246,
doi 10.1109/ISPASS.2010.5452013.
- V. V. Voevodin and Vl. V. Voevodin, Parallel Computing (BHV-Petersburg, St. Petersburg, 2002) [in Russian].
- R. Cuninghame-Green, Minimax Algebra (Springer, Berlin, 1979).
doi 10.1007/978-3-642-48708-8.
- N. K. Krivulin, Methods of Idempotent Algebra for Problems in Modeling and Analysis of Complex Systems (St. Petersburg Univ. Press, St. Petersburg, 2009) [in Russian].
- E. E. Tyrtyshnikov, Fundamentals of Algebra (Fizmatlit, Moscow, 2017) [in Russian].