Pub Date : 2012-08-01DOI: 10.1109/CNNA.2012.6331437
Antal Hiba, Zoltán Nagy, Miklos Ruszinko
Many real-life applications of processor-arrays suffer from memory bandwidth limitations. In many cases an unstructured mesh is given (computation on sensor data, simulations of physical systems - PDEs), where the vertices represent computations with dependencies represented by the edges. Utilization of processing elements (PEs) during these computations is mainly depends on the node indexing of the mesh. If the adjacent nodes are stored close to each other in main memory, the reloading of node data can be significantly decreased. In case of FPGA the memory accesses can be fully determined by the designer. The mesh and an ordering of its nodes, define the graph bandwidth, which determines the minimum size of on-chip memory to avoid reloading of the nodes from the off-chip memory. If the required on-chip memory size is higher than the available resources, the mesh must be divided into parts. In this paper a novel geometry-based method is presented, which constructs reordered parts from a given unstructured mesh, where each part meets some predefined constraints on graph bandwidth.
{"title":"Memory access optimization for computations on unstructured meshes","authors":"Antal Hiba, Zoltán Nagy, Miklos Ruszinko","doi":"10.1109/CNNA.2012.6331437","DOIUrl":"https://doi.org/10.1109/CNNA.2012.6331437","url":null,"abstract":"Many real-life applications of processor-arrays suffer from memory bandwidth limitations. In many cases an unstructured mesh is given (computation on sensor data, simulations of physical systems - PDEs), where the vertices represent computations with dependencies represented by the edges. Utilization of processing elements (PEs) during these computations is mainly depends on the node indexing of the mesh. If the adjacent nodes are stored close to each other in main memory, the reloading of node data can be significantly decreased. In case of FPGA the memory accesses can be fully determined by the designer. The mesh and an ordering of its nodes, define the graph bandwidth, which determines the minimum size of on-chip memory to avoid reloading of the nodes from the off-chip memory. If the required on-chip memory size is higher than the available resources, the mesh must be divided into parts. In this paper a novel geometry-based method is presented, which constructs reordered parts from a given unstructured mesh, where each part meets some predefined constraints on graph bandwidth.","PeriodicalId":387536,"journal":{"name":"2012 13th International Workshop on Cellular Nanoscale Networks and their Applications","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122998988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-08-01DOI: 10.1109/CNNA.2012.6331469
M. Colandrea, M. Magistris, C. Petrarca, M. Bernardo, S. Manfredi
We describe the realization of a new experimental setup for the analysis and characterization of complex networks of Chua's circuits. It is characterized by full configurability of the node's parameters and the network structure (topology and link impedances), and designed for easy scalability to high number of nodes. The set-up is automated in terms of control of the network and data acquisition by means of USB interfaced boards. A portable version of the set-up with 8 nodes is realized for demonstration purposes.
{"title":"Realization of a fully configurable complex network of non linear Chua's oscillators","authors":"M. Colandrea, M. Magistris, C. Petrarca, M. Bernardo, S. Manfredi","doi":"10.1109/CNNA.2012.6331469","DOIUrl":"https://doi.org/10.1109/CNNA.2012.6331469","url":null,"abstract":"We describe the realization of a new experimental setup for the analysis and characterization of complex networks of Chua's circuits. It is characterized by full configurability of the node's parameters and the network structure (topology and link impedances), and designed for easy scalability to high number of nodes. The set-up is automated in terms of control of the network and data acquisition by means of USB interfaced boards. A portable version of the set-up with 8 nodes is realized for demonstration purposes.","PeriodicalId":387536,"journal":{"name":"2012 13th International Workshop on Cellular Nanoscale Networks and their Applications","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125754821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-08-01DOI: 10.1109/CNNA.2012.6331421
Bin Wang, P. Dudek
This paper introduces a mapping method for adding a coarse grain (multiple pixels per processor) processing mode to massively parallel cellular processor arrays. The main motivation is to provide the fine grain pixel-parallel processor array with the ability of processing images with higher resolution than the array itself, in a way that is transparent to the programmer. The proposed method accomplishes the mapping work entirely during the code compilation process, which has four main advantages. Firstly, there is no extra overhead during processing. Secondly, the source code for fine grain mode can be used in coarse grain mode without modification. Thirdly, the proposed method does not introduce any restrictions of the number of pixels stored in a processing element. Finally, the proposed method is easy to implement, as it does not require any modifications to the hardware design of the pixel-parallel processor array or its controller, but only to the software compiler. The mapping method and its software implementation are presented in this paper.
{"title":"Coarse grain mapping method for image processing on fine grain cellular processor arrays","authors":"Bin Wang, P. Dudek","doi":"10.1109/CNNA.2012.6331421","DOIUrl":"https://doi.org/10.1109/CNNA.2012.6331421","url":null,"abstract":"This paper introduces a mapping method for adding a coarse grain (multiple pixels per processor) processing mode to massively parallel cellular processor arrays. The main motivation is to provide the fine grain pixel-parallel processor array with the ability of processing images with higher resolution than the array itself, in a way that is transparent to the programmer. The proposed method accomplishes the mapping work entirely during the code compilation process, which has four main advantages. Firstly, there is no extra overhead during processing. Secondly, the source code for fine grain mode can be used in coarse grain mode without modification. Thirdly, the proposed method does not introduce any restrictions of the number of pixels stored in a processing element. Finally, the proposed method is easy to implement, as it does not require any modifications to the hardware design of the pixel-parallel processor array or its controller, but only to the software compiler. The mapping method and its software implementation are presented in this paper.","PeriodicalId":387536,"journal":{"name":"2012 13th International Workshop on Cellular Nanoscale Networks and their Applications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125425578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-05-15DOI: 10.1109/ISCAS.2011.5937707
E. Cesur, N. Yildiz, V. Tavsanoglu
In this paper, a new Cellular Neural Network (CNN) structure for implementing two dimensional Gabor-type filters is proposed over our previous design. The structure is coded in VHDL and realized on a state of the art Altera Stratix IV 230 FPGA. The prototype supports Full-HD 1080p resolution and 60 Hz frame rate. One dedicated processor is used for each Euler iteration, where time step is taken as the same as optimum step size, and 50 iterations are implemented. The input/output, control, RAM and communication blocks of the realization are taken from our second generation real time CNN emulator (RTCNNP-v2).
{"title":"Demo: An improved FPGA implementation of CNN Gabor-type Filters","authors":"E. Cesur, N. Yildiz, V. Tavsanoglu","doi":"10.1109/ISCAS.2011.5937707","DOIUrl":"https://doi.org/10.1109/ISCAS.2011.5937707","url":null,"abstract":"In this paper, a new Cellular Neural Network (CNN) structure for implementing two dimensional Gabor-type filters is proposed over our previous design. The structure is coded in VHDL and realized on a state of the art Altera Stratix IV 230 FPGA. The prototype supports Full-HD 1080p resolution and 60 Hz frame rate. One dedicated processor is used for each Euler iteration, where time step is taken as the same as optimum step size, and 50 iterations are implemented. The input/output, control, RAM and communication blocks of the realization are taken from our second generation real time CNN emulator (RTCNNP-v2).","PeriodicalId":387536,"journal":{"name":"2012 13th International Workshop on Cellular Nanoscale Networks and their Applications","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122156863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}