通过FPGA自适应DMA控制器提高多媒体应用中GPU的性能

IF 0.6 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS International Journal of Pervasive Computing and Communications Pub Date : 2022-10-17 DOI:10.1108/ijpcc-06-2022-0241
S. B, K. E.
{"title":"通过FPGA自适应DMA控制器提高多媒体应用中GPU的性能","authors":"S. B, K. E.","doi":"10.1108/ijpcc-06-2022-0241","DOIUrl":null,"url":null,"abstract":"\nPurpose\nDeep learning techniques are unavoidable in a variety of domains such as health care, computer vision, cyber-security and so on. These algorithms demand high data transfers but require bottlenecks in achieving the high speed and low latency synchronization while being implemented in the real hardware architectures. Though direct memory access controller (DMAC) has gained a brighter light of research for achieving bulk data transfers, existing direct memory access (DMA) systems continue to face the challenges of achieving high-speed communication. The purpose of this study is to develop an adaptive-configured DMA architecture for bulk data transfer with high throughput and less time-delayed computation.\n\n\nDesign/methodology/approach\nThe proposed methodology consists of a heterogeneous computing system integrated with specialized hardware and software. For the hardware, the authors propose an field programmable gate array (FPGA)-based DMAC, which transfers the data to the graphics processing unit (GPU) using PCI-Express. The workload characterization technique is designed using Python software and is implementable for the advanced risk machine Cortex architecture with a suitable communication interface. This module offloads the input streams of data to the FPGA and initiates the FPGA for the control flow of data to the GPU that can achieve efficient processing.\n\n\nFindings\nThis paper presents an evaluation of a configurable workload-based DMA controller for collecting the data from the input devices and concurrently applying it to the GPU architecture, bypassing the hardware and software extraneous copies and bottlenecks via PCI Express. It also investigates the usage of adaptive DMA memory buffer allocation and workload characterization techniques. The proposed DMA architecture is compared with the other existing DMA architectures in which the performance of the proposed DMAC outperforms traditional DMA by achieving 96% throughput and 50% less latency synchronization.\n\n\nOriginality/value\nThe proposed gated recurrent unit has produced 95.6% accuracy in characterization of the workloads into heavy, medium and normal. The proposed model has outperformed the other algorithms and proves its strength for workload characterization.\n","PeriodicalId":43952,"journal":{"name":"International Journal of Pervasive Computing and Communications","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improving GPU performance in multimedia applications through FPGA based adaptive DMA controller\",\"authors\":\"S. B, K. E.\",\"doi\":\"10.1108/ijpcc-06-2022-0241\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nPurpose\\nDeep learning techniques are unavoidable in a variety of domains such as health care, computer vision, cyber-security and so on. These algorithms demand high data transfers but require bottlenecks in achieving the high speed and low latency synchronization while being implemented in the real hardware architectures. Though direct memory access controller (DMAC) has gained a brighter light of research for achieving bulk data transfers, existing direct memory access (DMA) systems continue to face the challenges of achieving high-speed communication. The purpose of this study is to develop an adaptive-configured DMA architecture for bulk data transfer with high throughput and less time-delayed computation.\\n\\n\\nDesign/methodology/approach\\nThe proposed methodology consists of a heterogeneous computing system integrated with specialized hardware and software. For the hardware, the authors propose an field programmable gate array (FPGA)-based DMAC, which transfers the data to the graphics processing unit (GPU) using PCI-Express. The workload characterization technique is designed using Python software and is implementable for the advanced risk machine Cortex architecture with a suitable communication interface. This module offloads the input streams of data to the FPGA and initiates the FPGA for the control flow of data to the GPU that can achieve efficient processing.\\n\\n\\nFindings\\nThis paper presents an evaluation of a configurable workload-based DMA controller for collecting the data from the input devices and concurrently applying it to the GPU architecture, bypassing the hardware and software extraneous copies and bottlenecks via PCI Express. It also investigates the usage of adaptive DMA memory buffer allocation and workload characterization techniques. The proposed DMA architecture is compared with the other existing DMA architectures in which the performance of the proposed DMAC outperforms traditional DMA by achieving 96% throughput and 50% less latency synchronization.\\n\\n\\nOriginality/value\\nThe proposed gated recurrent unit has produced 95.6% accuracy in characterization of the workloads into heavy, medium and normal. The proposed model has outperformed the other algorithms and proves its strength for workload characterization.\\n\",\"PeriodicalId\":43952,\"journal\":{\"name\":\"International Journal of Pervasive Computing and Communications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2022-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Pervasive Computing and Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/ijpcc-06-2022-0241\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Pervasive Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ijpcc-06-2022-0241","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 1

摘要

目的深度学习技术在医疗保健、计算机视觉、网络安全等领域是不可避免的。这些算法需要高数据传输,但在实际硬件架构中实现时,在实现高速和低延迟同步方面需要瓶颈。尽管直接存储器存取控制器(DMAC)在实现大容量数据传输方面获得了更光明的研究前景,但现有的直接存储器存取(DMA)系统仍然面临着实现高速通信的挑战。本研究的目的是开发一种自适应配置的DMA架构,用于具有高吞吐量和较少时延计算的批量数据传输。设计/方法论/方法论所提出的方法论由集成了专用硬件和软件的异构计算系统组成。在硬件方面,作者提出了一种基于现场可编程门阵列(FPGA)的DMAC,它使用PCI Express将数据传输到图形处理单元(GPU)。工作负载表征技术是使用Python软件设计的,并且可以在具有适当通信接口的高级风险机器Cortex架构中实现。该模块将输入数据流卸载到FPGA,并启动FPGA以控制到GPU的数据流,从而实现高效处理。发现本文对一种基于可配置工作负载的DMA控制器进行了评估,该控制器用于从输入设备收集数据,并同时将其应用于GPU架构,通过PCI Express绕过硬件和软件无关的副本和瓶颈。它还研究了自适应DMA内存缓冲区分配和工作负载表征技术的使用。将所提出的DMA架构与其他现有DMA架构进行比较,其中所提出的DMAC的性能优于传统DMA,实现了96%的吞吐量和50%的延迟同步。独创性/价值所提出的门控递归单元在将工作量分为重、中和正常时的准确率为95.6%。所提出的模型优于其他算法,并证明了其在工作负载表征方面的优势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving GPU performance in multimedia applications through FPGA based adaptive DMA controller
Purpose Deep learning techniques are unavoidable in a variety of domains such as health care, computer vision, cyber-security and so on. These algorithms demand high data transfers but require bottlenecks in achieving the high speed and low latency synchronization while being implemented in the real hardware architectures. Though direct memory access controller (DMAC) has gained a brighter light of research for achieving bulk data transfers, existing direct memory access (DMA) systems continue to face the challenges of achieving high-speed communication. The purpose of this study is to develop an adaptive-configured DMA architecture for bulk data transfer with high throughput and less time-delayed computation. Design/methodology/approach The proposed methodology consists of a heterogeneous computing system integrated with specialized hardware and software. For the hardware, the authors propose an field programmable gate array (FPGA)-based DMAC, which transfers the data to the graphics processing unit (GPU) using PCI-Express. The workload characterization technique is designed using Python software and is implementable for the advanced risk machine Cortex architecture with a suitable communication interface. This module offloads the input streams of data to the FPGA and initiates the FPGA for the control flow of data to the GPU that can achieve efficient processing. Findings This paper presents an evaluation of a configurable workload-based DMA controller for collecting the data from the input devices and concurrently applying it to the GPU architecture, bypassing the hardware and software extraneous copies and bottlenecks via PCI Express. It also investigates the usage of adaptive DMA memory buffer allocation and workload characterization techniques. The proposed DMA architecture is compared with the other existing DMA architectures in which the performance of the proposed DMAC outperforms traditional DMA by achieving 96% throughput and 50% less latency synchronization. Originality/value The proposed gated recurrent unit has produced 95.6% accuracy in characterization of the workloads into heavy, medium and normal. The proposed model has outperformed the other algorithms and proves its strength for workload characterization.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Pervasive Computing and Communications
International Journal of Pervasive Computing and Communications COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS-
CiteScore
6.60
自引率
0.00%
发文量
54
期刊最新文献
Big data challenges and opportunities in Internet of Vehicles: a systematic review Cooperative optimization techniques in distributed MAC protocols – a survey Novel communication system for buried water pipe monitoring using acoustic signal propagation along the pipe A new predictive approach for the MAC layer misbehavior in IEEE 802.11 networks Clustering based EO with MRF technique for effective load balancing in cloud computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1