A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

arXiv - CS - Distributed, Parallel, and Cluster Computing Pub Date : 2024-09-16 DOI:arxiv-2409.10661

Xinyao Yi

{"title":"A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture","authors":"Xinyao Yi","doi":"arxiv-2409.10661","DOIUrl":null,"url":null,"abstract":"Parallel computing is a standard approach to achieving high-performance\ncomputing (HPC). Three commonly used methods to implement parallel computing\ninclude: 1) applying multithreading technology on single-core or multi-core\nCPUs; 2) incorporating powerful parallel computing devices such as GPUs, FPGAs,\nand other accelerators; and 3) utilizing special parallel architectures like\nSingle Instruction/Multiple Data (SIMD). Many researchers have made efforts using different parallel technologies,\nincluding developing applications, conducting performance analyses, identifying\nperformance bottlenecks, and proposing feasible solutions. However, balancing\nand optimizing parallel programs remain challenging due to the complexity of\nparallel algorithms and hardware architectures. Issues such as data transfer\nbetween hosts and devices in heterogeneous systems continue to be bottlenecks\nthat limit performance. This work summarizes a vast amount of information on various parallel\nprogramming techniques, aiming to present the current state and future\ndevelopment trends of parallel programming, performance issues, and solutions.\nIt seeks to give readers an overall picture and provide background knowledge to\nsupport subsequent research.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2) incorporating powerful parallel computing devices such as GPUs, FPGAs, and other accelerators; and 3) utilizing special parallel architectures like Single Instruction/Multiple Data (SIMD). Many researchers have made efforts using different parallel technologies, including developing applications, conducting performance analyses, identifying performance bottlenecks, and proposing feasible solutions. However, balancing and optimizing parallel programs remain challenging due to the complexity of parallel algorithms and hardware architectures. Issues such as data transfer between hosts and devices in heterogeneous systems continue to be bottlenecks that limit performance. This work summarizes a vast amount of information on various parallel programming techniques, aiming to present the current state and future development trends of parallel programming, performance issues, and solutions. It seeks to give readers an overall picture and provide background knowledge to support subsequent research.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CPU、GPU 加速计算机和 SIMD 架构的性能编程研究

并行计算是实现高性能计算（HPC）的标准方法。实现并行计算的三种常用方法包括1）在单核或多核CPU上应用多线程技术；2）集成强大的并行计算设备，如GPU、FPGA和其他加速器；3）利用特殊的并行架构，如单指令/多数据（SIMD）。许多研究人员利用不同的并行技术做出了努力，包括开发应用程序、进行性能分析、找出性能瓶颈并提出可行的解决方案。然而，由于并行算法和硬件架构的复杂性，平衡和优化并行程序仍然具有挑战性。异构系统中主机和设备之间的数据传输等问题仍然是限制性能的瓶颈。本著作总结了各种并行编程技术的大量信息，旨在介绍并行编程的现状和未来发展趋势、性能问题和解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Distributed, Parallel, and Cluster Computing

自引率

0.00%

发文量

期刊最新文献

Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844