图形处理单元上卷积神经网络的可配置纹理单元

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) Pub Date : 2019-03-01 DOI:10.1109/AICAS.2019.8771629

Yi-Hsiang Chen, Shao-Yi Chien

{"title":"图形处理单元上卷积神经网络的可配置纹理单元","authors":"Yi-Hsiang Chen, Shao-Yi Chien","doi":"10.1109/AICAS.2019.8771629","DOIUrl":null,"url":null,"abstract":"To accelerate Convolutional Neural Networks (CNN) operations on resource-limited mobile graphics processing units (GPUs), taking advantage of the common characteristics between texture filtering and convolutional layer, we propose a configurable texture unit called tensor and texture unit (TTU) to offload the computation from shader cores. With adding a new datapath for loading weight parameters in the texture unit, reusing the original texture cache, increasing the flexibility of the filtering unit, and packing the input data and weight parameters to fixed-point format, we make the texture unit be able to support convolutional and pooling layers with only small modifications. The proposed architecture is verified by integrating TTU into a GPU system in RTL level. Experimental results show that 18.54x speedup can be achieved with the overhead of only 8.5% compared with a GPU system with a traditional texture unit.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"351 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Configurable Texture Unit for Convolutional Neural Networks on Graphics Processing Units\",\"authors\":\"Yi-Hsiang Chen, Shao-Yi Chien\",\"doi\":\"10.1109/AICAS.2019.8771629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To accelerate Convolutional Neural Networks (CNN) operations on resource-limited mobile graphics processing units (GPUs), taking advantage of the common characteristics between texture filtering and convolutional layer, we propose a configurable texture unit called tensor and texture unit (TTU) to offload the computation from shader cores. With adding a new datapath for loading weight parameters in the texture unit, reusing the original texture cache, increasing the flexibility of the filtering unit, and packing the input data and weight parameters to fixed-point format, we make the texture unit be able to support convolutional and pooling layers with only small modifications. The proposed architecture is verified by integrating TTU into a GPU system in RTL level. Experimental results show that 18.54x speedup can be achieved with the overhead of only 8.5% compared with a GPU system with a traditional texture unit.\",\"PeriodicalId\":273095,\"journal\":{\"name\":\"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"volume\":\"351 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICAS.2019.8771629\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS.2019.8771629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

为了在资源有限的移动图形处理单元(gpu)上加速卷积神经网络(CNN)的运算，利用纹理过滤和卷积层之间的共同特征，我们提出了一种可配置的纹理单元，称为张量和纹理单元(TTU)，以减轻着色器内核的计算负担。通过在纹理单元中添加新的加载权值参数的数据路径，重用原始纹理缓存，增加过滤单元的灵活性，并将输入数据和权值参数打包为定点格式，我们使纹理单元仅经过少量修改即可支持卷积层和池化层。通过将TTU集成到RTL级的GPU系统中，验证了所提出的架构。实验结果表明，与使用传统纹理单元的GPU系统相比，该算法可以实现18.54倍的加速，开销仅为8.5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Configurable Texture Unit for Convolutional Neural Networks on Graphics Processing Units

To accelerate Convolutional Neural Networks (CNN) operations on resource-limited mobile graphics processing units (GPUs), taking advantage of the common characteristics between texture filtering and convolutional layer, we propose a configurable texture unit called tensor and texture unit (TTU) to offload the computation from shader cores. With adding a new datapath for loading weight parameters in the texture unit, reusing the original texture cache, increasing the flexibility of the filtering unit, and packing the input data and weight parameters to fixed-point format, we make the texture unit be able to support convolutional and pooling layers with only small modifications. The proposed architecture is verified by integrating TTU into a GPU system in RTL level. Experimental results show that 18.54x speedup can be achieved with the overhead of only 8.5% compared with a GPU system with a traditional texture unit.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)

自引率

0.00%

发文量

期刊最新文献

Artificial Intelligence of Things Wearable System for Cardiac Disease Detection Fast event-driven incremental learning of hand symbols Accelerating CNN-RNN Based Machine Health Monitoring on FPGA Neuromorphic networks on the SpiNNaker platform Complexity Reduction on HEVC Intra Mode Decision with modified LeNet-5