{"title":"A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture","authors":"Xinyao Yi","doi":"arxiv-2409.10661","DOIUrl":null,"url":null,"abstract":"Parallel computing is a standard approach to achieving high-performance\ncomputing (HPC). Three commonly used methods to implement parallel computing\ninclude: 1) applying multithreading technology on single-core or multi-core\nCPUs; 2) incorporating powerful parallel computing devices such as GPUs, FPGAs,\nand other accelerators; and 3) utilizing special parallel architectures like\nSingle Instruction/Multiple Data (SIMD). Many researchers have made efforts using different parallel technologies,\nincluding developing applications, conducting performance analyses, identifying\nperformance bottlenecks, and proposing feasible solutions. However, balancing\nand optimizing parallel programs remain challenging due to the complexity of\nparallel algorithms and hardware architectures. Issues such as data transfer\nbetween hosts and devices in heterogeneous systems continue to be bottlenecks\nthat limit performance. This work summarizes a vast amount of information on various parallel\nprogramming techniques, aiming to present the current state and future\ndevelopment trends of parallel programming, performance issues, and solutions.\nIt seeks to give readers an overall picture and provide background knowledge to\nsupport subsequent research.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.10661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Parallel computing is a standard approach to achieving high-performance
computing (HPC). Three commonly used methods to implement parallel computing
include: 1) applying multithreading technology on single-core or multi-core
CPUs; 2) incorporating powerful parallel computing devices such as GPUs, FPGAs,
and other accelerators; and 3) utilizing special parallel architectures like
Single Instruction/Multiple Data (SIMD). Many researchers have made efforts using different parallel technologies,
including developing applications, conducting performance analyses, identifying
performance bottlenecks, and proposing feasible solutions. However, balancing
and optimizing parallel programs remain challenging due to the complexity of
parallel algorithms and hardware architectures. Issues such as data transfer
between hosts and devices in heterogeneous systems continue to be bottlenecks
that limit performance. This work summarizes a vast amount of information on various parallel
programming techniques, aiming to present the current state and future
development trends of parallel programming, performance issues, and solutions.
It seeks to give readers an overall picture and provide background knowledge to
support subsequent research.