{"title":"形态学运算在GPU和FPGA上的高效并行实现","authors":"Teng Li, Y. Dou, Jingfei Jiang, Jing Gao","doi":"10.1109/SPAC.2014.6982728","DOIUrl":null,"url":null,"abstract":"Morphological operation constitutes one of a powerful and versatile image and video applications applied to a wide range of domains, from object recognition, to feature extraction and to moving objects detection in computer vision where real-time and high-performance are required. However, the throughput of morphological operation is constrained by the convolutional characteristic. In this paper, we analysis the parallelism of morphological operation and parallel implementations on the graphics processing unit (GPU), and field programming gate array (FPGA) are presented. For GPU platform, we propose the optimized schemes based on global memory, texture memory and shared memory, achieving the throughput of 942.63 Mbps with 3×3 structuring element. For FPGA platform, we present an optimized method based on the traditional delay-line architecture. For 3×3 structuring element, it achieves a throughput of 462.64 Mbps.","PeriodicalId":326246,"journal":{"name":"Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Efficient parallel implementation of morphological operation on GPU and FPGA\",\"authors\":\"Teng Li, Y. Dou, Jingfei Jiang, Jing Gao\",\"doi\":\"10.1109/SPAC.2014.6982728\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Morphological operation constitutes one of a powerful and versatile image and video applications applied to a wide range of domains, from object recognition, to feature extraction and to moving objects detection in computer vision where real-time and high-performance are required. However, the throughput of morphological operation is constrained by the convolutional characteristic. In this paper, we analysis the parallelism of morphological operation and parallel implementations on the graphics processing unit (GPU), and field programming gate array (FPGA) are presented. For GPU platform, we propose the optimized schemes based on global memory, texture memory and shared memory, achieving the throughput of 942.63 Mbps with 3×3 structuring element. For FPGA platform, we present an optimized method based on the traditional delay-line architecture. For 3×3 structuring element, it achieves a throughput of 462.64 Mbps.\",\"PeriodicalId\":326246,\"journal\":{\"name\":\"Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPAC.2014.6982728\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPAC.2014.6982728","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient parallel implementation of morphological operation on GPU and FPGA
Morphological operation constitutes one of a powerful and versatile image and video applications applied to a wide range of domains, from object recognition, to feature extraction and to moving objects detection in computer vision where real-time and high-performance are required. However, the throughput of morphological operation is constrained by the convolutional characteristic. In this paper, we analysis the parallelism of morphological operation and parallel implementations on the graphics processing unit (GPU), and field programming gate array (FPGA) are presented. For GPU platform, we propose the optimized schemes based on global memory, texture memory and shared memory, achieving the throughput of 942.63 Mbps with 3×3 structuring element. For FPGA platform, we present an optimized method based on the traditional delay-line architecture. For 3×3 structuring element, it achieves a throughput of 462.64 Mbps.