A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Xinyao Yi
{"title":"A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture","authors":"Xinyao Yi","doi":"arxiv-2409.10661","DOIUrl":null,"url":null,"abstract":"Parallel computing is a standard approach to achieving high-performance\ncomputing (HPC). Three commonly used methods to implement parallel computing\ninclude: 1) applying multithreading technology on single-core or multi-core\nCPUs; 2) incorporating powerful parallel computing devices such as GPUs, FPGAs,\nand other accelerators; and 3) utilizing special parallel architectures like\nSingle Instruction/Multiple Data (SIMD). Many researchers have made efforts using different parallel technologies,\nincluding developing applications, conducting performance analyses, identifying\nperformance bottlenecks, and proposing feasible solutions. However, balancing\nand optimizing parallel programs remain challenging due to the complexity of\nparallel algorithms and hardware architectures. Issues such as data transfer\nbetween hosts and devices in heterogeneous systems continue to be bottlenecks\nthat limit performance. This work summarizes a vast amount of information on various parallel\nprogramming techniques, aiming to present the current state and future\ndevelopment trends of parallel programming, performance issues, and solutions.\nIt seeks to give readers an overall picture and provide background knowledge to\nsupport subsequent research.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2) incorporating powerful parallel computing devices such as GPUs, FPGAs, and other accelerators; and 3) utilizing special parallel architectures like Single Instruction/Multiple Data (SIMD). Many researchers have made efforts using different parallel technologies, including developing applications, conducting performance analyses, identifying performance bottlenecks, and proposing feasible solutions. However, balancing and optimizing parallel programs remain challenging due to the complexity of parallel algorithms and hardware architectures. Issues such as data transfer between hosts and devices in heterogeneous systems continue to be bottlenecks that limit performance. This work summarizes a vast amount of information on various parallel programming techniques, aiming to present the current state and future development trends of parallel programming, performance issues, and solutions. It seeks to give readers an overall picture and provide background knowledge to support subsequent research.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CPU、GPU 加速计算机和 SIMD 架构的性能编程研究
并行计算是实现高性能计算(HPC)的标准方法。实现并行计算的三种常用方法包括1)在单核或多核CPU上应用多线程技术;2)集成强大的并行计算设备,如GPU、FPGA和其他加速器;3)利用特殊的并行架构,如单指令/多数据(SIMD)。许多研究人员利用不同的并行技术做出了努力,包括开发应用程序、进行性能分析、找出性能瓶颈并提出可行的解决方案。然而,由于并行算法和硬件架构的复杂性,平衡和优化并行程序仍然具有挑战性。异构系统中主机和设备之间的数据传输等问题仍然是限制性能的瓶颈。本著作总结了各种并行编程技术的大量信息,旨在介绍并行编程的现状和未来发展趋势、性能问题和解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massively parallel CMA-ES with increasing population Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations Energy Efficiency Support for Software Defined Networks: a Serverless Computing Approach CountChain: A Decentralized Oracle Network for Counting Systems Delay Analysis of EIP-4844
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1