{"title":"图形处理单元上卷积神经网络的可配置纹理单元","authors":"Yi-Hsiang Chen, Shao-Yi Chien","doi":"10.1109/AICAS.2019.8771629","DOIUrl":null,"url":null,"abstract":"To accelerate Convolutional Neural Networks (CNN) operations on resource-limited mobile graphics processing units (GPUs), taking advantage of the common characteristics between texture filtering and convolutional layer, we propose a configurable texture unit called tensor and texture unit (TTU) to offload the computation from shader cores. With adding a new datapath for loading weight parameters in the texture unit, reusing the original texture cache, increasing the flexibility of the filtering unit, and packing the input data and weight parameters to fixed-point format, we make the texture unit be able to support convolutional and pooling layers with only small modifications. The proposed architecture is verified by integrating TTU into a GPU system in RTL level. Experimental results show that 18.54x speedup can be achieved with the overhead of only 8.5% compared with a GPU system with a traditional texture unit.","PeriodicalId":273095,"journal":{"name":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","volume":"351 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Configurable Texture Unit for Convolutional Neural Networks on Graphics Processing Units\",\"authors\":\"Yi-Hsiang Chen, Shao-Yi Chien\",\"doi\":\"10.1109/AICAS.2019.8771629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To accelerate Convolutional Neural Networks (CNN) operations on resource-limited mobile graphics processing units (GPUs), taking advantage of the common characteristics between texture filtering and convolutional layer, we propose a configurable texture unit called tensor and texture unit (TTU) to offload the computation from shader cores. With adding a new datapath for loading weight parameters in the texture unit, reusing the original texture cache, increasing the flexibility of the filtering unit, and packing the input data and weight parameters to fixed-point format, we make the texture unit be able to support convolutional and pooling layers with only small modifications. The proposed architecture is verified by integrating TTU into a GPU system in RTL level. Experimental results show that 18.54x speedup can be achieved with the overhead of only 8.5% compared with a GPU system with a traditional texture unit.\",\"PeriodicalId\":273095,\"journal\":{\"name\":\"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"volume\":\"351 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICAS.2019.8771629\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICAS.2019.8771629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Configurable Texture Unit for Convolutional Neural Networks on Graphics Processing Units
To accelerate Convolutional Neural Networks (CNN) operations on resource-limited mobile graphics processing units (GPUs), taking advantage of the common characteristics between texture filtering and convolutional layer, we propose a configurable texture unit called tensor and texture unit (TTU) to offload the computation from shader cores. With adding a new datapath for loading weight parameters in the texture unit, reusing the original texture cache, increasing the flexibility of the filtering unit, and packing the input data and weight parameters to fixed-point format, we make the texture unit be able to support convolutional and pooling layers with only small modifications. The proposed architecture is verified by integrating TTU into a GPU system in RTL level. Experimental results show that 18.54x speedup can be achieved with the overhead of only 8.5% compared with a GPU system with a traditional texture unit.