Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105381
Jianlei Yang, Yici Cai, Qiang Zhou, Jin Shi
Robust and efficient algorithms for power grid analysis are crucial for both VLSI design and optimization. Due to the increasing size of power grids IR drop analysis has become more computationally challenging both in runtime and memory consumption. This work presents a fast Poisson solver preconditioned method for unstructured power grid with unideal boundary conditions. In fact, by taking the advantage of analytical formulation of power grids this analytical preconditioner can be considered as sparse approximate inverse technique. By combining this analytical preconditioner with robust conjugate gradient method, we demonstrate that this approach is totally robust for extremely large scale power grid simulations. Experimental results have shown that iterations of our proposed method will hardly increase with grid size increasing once the pads density and the range of metal resistances value distribution have been decided. We demonstrated that this approach solves an unstructured power grid with 2.56M nodes in only 1/3 iterations of classical ICCG solver, and achieves almost 20X speedups over the classical ICCG solver on runtime.
{"title":"Fast poisson solver preconditioned method for robust power grid analysis","authors":"Jianlei Yang, Yici Cai, Qiang Zhou, Jin Shi","doi":"10.1109/ICCAD.2011.6105381","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105381","url":null,"abstract":"Robust and efficient algorithms for power grid analysis are crucial for both VLSI design and optimization. Due to the increasing size of power grids IR drop analysis has become more computationally challenging both in runtime and memory consumption. This work presents a fast Poisson solver preconditioned method for unstructured power grid with unideal boundary conditions. In fact, by taking the advantage of analytical formulation of power grids this analytical preconditioner can be considered as sparse approximate inverse technique. By combining this analytical preconditioner with robust conjugate gradient method, we demonstrate that this approach is totally robust for extremely large scale power grid simulations. Experimental results have shown that iterations of our proposed method will hardly increase with grid size increasing once the pads density and the range of metal resistances value distribution have been decided. We demonstrated that this approach solves an unstructured power grid with 2.56M nodes in only 1/3 iterations of classical ICCG solver, and achieves almost 20X speedups over the classical ICCG solver on runtime.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91098098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105360
Pin-Yi Kuo, Chun-Yao Wang, Ching-Yi Huang
Rewiring is a well developed and widely used technique in the synthesis and optimization of traditional Boolean logic designs. The threshold logic is a new alternative logic representation to Boolean logic which poses a compactness characteristic of representation. Nowadays, with the advances in nanomaterials, research on multi-level synthesis, verification, and testing for threshold networks is flourishing. This paper presents an algorithm for rewiring in a threshold network. It works by removing a target wire, and then corrects circuit's functionality by adding a corresponding rectification network. It also proposes a simplification procedure for representing a threshold logic gate canonically. The experimental results show that our approach has 7.1 times speedup compared to the-state-of-the-art multi-level synthesis algorithm, in synthesizing a threshold network with a new fanin number constraint.
{"title":"On rewiring and simplification for canonicity in threshold logic circuits","authors":"Pin-Yi Kuo, Chun-Yao Wang, Ching-Yi Huang","doi":"10.1109/ICCAD.2011.6105360","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105360","url":null,"abstract":"Rewiring is a well developed and widely used technique in the synthesis and optimization of traditional Boolean logic designs. The threshold logic is a new alternative logic representation to Boolean logic which poses a compactness characteristic of representation. Nowadays, with the advances in nanomaterials, research on multi-level synthesis, verification, and testing for threshold networks is flourishing. This paper presents an algorithm for rewiring in a threshold network. It works by removing a target wire, and then corrects circuit's functionality by adding a corresponding rectification network. It also proposes a simplification procedure for representing a threshold logic gate canonically. The experimental results show that our approach has 7.1 times speedup compared to the-state-of-the-art multi-level synthesis algorithm, in synthesizing a threshold network with a new fanin number constraint.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91412881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Wang, Mo Xu, Ling Ren, Xiaorui Zhang, Di Wu, Yong He, Ningyi Xu, Huazhong Yang
The research on understanding the human brain has attracted more and more attention. A promising method is to model the brain as a network based on modern imaging technologies and then to apply graph theory algorithms for analysis. In this work, we examine the computing bottleneck of this method, and propose a CPU-GPU heterogeneous platform to accelerate the process. We construct a statistical brain network from a sample of 198 people and get characteristics such as nodal degree and modularity. This is the first study of voxel-based brain networks on large samples. We also illustrate that domain-specific hardware platform can have a significant impact on neuroscience studies.
{"title":"A heterogeneous accelerator platform for multi-subject voxel-based brain network analysis","authors":"Yu Wang, Mo Xu, Ling Ren, Xiaorui Zhang, Di Wu, Yong He, Ningyi Xu, Huazhong Yang","doi":"10.5555/2132325.2132413","DOIUrl":"https://doi.org/10.5555/2132325.2132413","url":null,"abstract":"The research on understanding the human brain has attracted more and more attention. A promising method is to model the brain as a network based on modern imaging technologies and then to apply graph theory algorithms for analysis. In this work, we examine the computing bottleneck of this method, and propose a CPU-GPU heterogeneous platform to accelerate the process. We construct a statistical brain network from a sample of 198 people and get characteristics such as nodal degree and modularity. This is the first study of voxel-based brain networks on large samples. We also illustrate that domain-specific hardware platform can have a significant impact on neuroscience studies.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90891051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105353
Miao Hu, Hai Helen Li, R. Pino
The emerging memristor devices have recently received increased attention since HP Lab reported the first TiO2-based memristive structure. As it is at nano-scale geometry size, the uniformity of memristor device is difficult to control due to the process variations in the fabrication process. The incurred design concerns in a memristor-based computing system, e.g, neuromorphic computing, can be very severe because the analog states of memristors are heavily utilized. Therefore, the understanding and quantitative characterization of the impact of process variations on the electrical properties of memristors become crucial for the corresponding VLSI designs. In this work, we examined the theoretical model of TiO2 thin-film memristors and studied the relationships between the electrical parameters and the process variations of the devices. A statistical model based on a process-variation aware memristor device structure is extracted accordingly. Simulations show that our proposed model is 3 ∼ 4 magnitude faster than the existing Monte-Carlo simulation method, with only ∼ 2% accuracy degradation. A variable gain amplifier (VGA) is used as the case study to demonstrate the applications of our model in memristor-based circuit designs.
{"title":"Fast statistical model of TiO2 thin-film memristor and design implication","authors":"Miao Hu, Hai Helen Li, R. Pino","doi":"10.1109/ICCAD.2011.6105353","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105353","url":null,"abstract":"The emerging memristor devices have recently received increased attention since HP Lab reported the first TiO2-based memristive structure. As it is at nano-scale geometry size, the uniformity of memristor device is difficult to control due to the process variations in the fabrication process. The incurred design concerns in a memristor-based computing system, e.g, neuromorphic computing, can be very severe because the analog states of memristors are heavily utilized. Therefore, the understanding and quantitative characterization of the impact of process variations on the electrical properties of memristors become crucial for the corresponding VLSI designs. In this work, we examined the theoretical model of TiO2 thin-film memristors and studied the relationships between the electrical parameters and the process variations of the devices. A statistical model based on a process-variation aware memristor device structure is extracted accordingly. Simulations show that our proposed model is 3 ∼ 4 magnitude faster than the existing Monte-Carlo simulation method, with only ∼ 2% accuracy degradation. A variable gain amplifier (VGA) is used as the case study to demonstrate the applications of our model in memristor-based circuit designs.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77801141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105322
Charles Lamech, Jim Aarestad, J. Plusquellic, R. Rad, K. Agarwal
As feature printability becomes more challenging in advanced technology nodes, measuring and characterizing process variation effects on delay and power is becoming increasingly important. In this paper, we present two embedded test structures (ETS) for carrying out path delay measurement in actual product designs. Of the two structures proposed here, one is designed to be incorporated into a customer's scan structures, augmenting selected functional units with the ability to perform accurate path delay measurements. We refer to this ETS as REBEL (regional delay behavior). It is designed to leverage the existing scan chain as a means of reducing area overhead and performance impact. For cases in which very high resolution of delay measurements is required, a second standalone structure is proposed which we refer to as TDC for time-to-digital converter. Beyond characterizing process variations, these ETSs can also be used for design debug, detection of hardware Trojans and small delay defects and as physical unclonable functions.
{"title":"REBEL and TDC: Two embedded test structures for on-chip measurements of within-die path delay variations","authors":"Charles Lamech, Jim Aarestad, J. Plusquellic, R. Rad, K. Agarwal","doi":"10.1109/ICCAD.2011.6105322","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105322","url":null,"abstract":"As feature printability becomes more challenging in advanced technology nodes, measuring and characterizing process variation effects on delay and power is becoming increasingly important. In this paper, we present two embedded test structures (ETS) for carrying out path delay measurement in actual product designs. Of the two structures proposed here, one is designed to be incorporated into a customer's scan structures, augmenting selected functional units with the ability to perform accurate path delay measurements. We refer to this ETS as REBEL (regional delay behavior). It is designed to leverage the existing scan chain as a means of reducing area overhead and performance impact. For cases in which very high resolution of delay measurements is required, a second standalone structure is proposed which we refer to as TDC for time-to-digital converter. Beyond characterizing process variations, these ETSs can also be used for design debug, detection of hardware Trojans and small delay defects and as physical unclonable functions.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80908941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105314
Bing Li, Ning Chen
Post-Silicon Tunable (PST) clock buffers are widely used in high performance designs to counter process variations. By allowing delay compensation between consecutive register stages, PST buffers can effectively improve the yield of digital circuits. To date, the evaluation of manufacturing yield in the presence of PST buffers is only possible using Monte Carlo simulation. In this paper, we propose an alternative method based on graph transformations, which is much faster, more than 1000 times, and computes a parametric minimum clock period. It also identifies the gates which are most critical to the circuit performance, therefore enabling a fast analysis-optimization flow.
{"title":"Fast statistical timing analysis for circuits with Post-Silicon Tunable clock buffers","authors":"Bing Li, Ning Chen","doi":"10.1109/ICCAD.2011.6105314","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105314","url":null,"abstract":"Post-Silicon Tunable (PST) clock buffers are widely used in high performance designs to counter process variations. By allowing delay compensation between consecutive register stages, PST buffers can effectively improve the yield of digital circuits. To date, the evaluation of manufacturing yield in the presence of PST buffers is only possible using Monte Carlo simulation. In this paper, we propose an alternative method based on graph transformations, which is much faster, more than 1000 times, and computes a parametric minimum clock period. It also identifies the gates which are most critical to the circuit performance, therefore enabling a fast analysis-optimization flow.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82484437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105304
Jishen Zhao, Cong Xu, Yuan Xie
In chip-multiprocessor (CMP) designs, limited memory bandwidth is a potential bottleneck of the system performance. New memory technologies, such as spin-torque-transfer memory (STT-RAM), resistive memory (RRAM), and embedded DRAM (eDRAM), are promising on-chip memory solutions for CMPs. In this paper, we propose a bandwidth-aware re-configurable cache hierarchy (BARCH) with hybrid memory technologies. BARCH consists of a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our hybrid cache hierarchy chooses different memory technologies to configure each level so that the bandwidth provided by the overall hierarchy is optimized. Furthermore, we present a reconfiguration mechanism to dynamically adapt the cache space of each level based on the predicted bandwidth demands of different applications, which is guaranteed by our prediction engine. We evaluate the system performance gain obtained by our method with a set of multithreaded and multiprogrammed applications. Compared to traditional SRAM-based cache designs, our proposed design improves the system throughput by 58% and 14% for multithreaded and multiprogrammed applications, respectively.1
{"title":"Bandwidth-aware reconfigurable cache design with hybrid memory technologies","authors":"Jishen Zhao, Cong Xu, Yuan Xie","doi":"10.1109/ICCAD.2011.6105304","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105304","url":null,"abstract":"In chip-multiprocessor (CMP) designs, limited memory bandwidth is a potential bottleneck of the system performance. New memory technologies, such as spin-torque-transfer memory (STT-RAM), resistive memory (RRAM), and embedded DRAM (eDRAM), are promising on-chip memory solutions for CMPs. In this paper, we propose a bandwidth-aware re-configurable cache hierarchy (BARCH) with hybrid memory technologies. BARCH consists of a hybrid cache hierarchy, a reconfiguration mechanism, and a statistical prediction engine. Our hybrid cache hierarchy chooses different memory technologies to configure each level so that the bandwidth provided by the overall hierarchy is optimized. Furthermore, we present a reconfiguration mechanism to dynamically adapt the cache space of each level based on the predicted bandwidth demands of different applications, which is guaranteed by our prediction engine. We evaluate the system performance gain obtained by our method with a set of multithreaded and multiprogrammed applications. Compared to traditional SRAM-based cache designs, our proposed design improves the system throughput by 58% and 14% for multithreaded and multiprogrammed applications, respectively.1","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78336597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105375
Amith Singhee
This paper presents a novel method, PTrace, to locally and uniformly trace convex bicriterial Pareto-optimal fronts for bicriterial optimization problems that, unlike existing methods, does not require derivatives of the objectives with respect to the design variables. The method computes a sequence of points along the front in a user-specified direction from a starting point, such that the points are roughly uniformly spaced as per a spacing constraint from the user. At each iteration, a local quadratic model of the front is used to estimate an appropriate weighted sum of objectives that, on optimization, will give the next point on the front. A single objective optimization on this weighted sum then generates the actual point, which is then used to build a new local model. The method uses convexity-based heuristics to improve on mildly sub-optimal results from the optimizer and reuses cached points to improve the optimization speed and quality. We test the method on a synthetic and a 6-T SRAM power-performance tradeoff test case to demonstrate its effectiveness.
{"title":"PTrace: Derivative-free local tracing of bicriterial design tradeoffs","authors":"Amith Singhee","doi":"10.1109/ICCAD.2011.6105375","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105375","url":null,"abstract":"This paper presents a novel method, PTrace, to locally and uniformly trace convex bicriterial Pareto-optimal fronts for bicriterial optimization problems that, unlike existing methods, does not require derivatives of the objectives with respect to the design variables. The method computes a sequence of points along the front in a user-specified direction from a starting point, such that the points are roughly uniformly spaced as per a spacing constraint from the user. At each iteration, a local quadratic model of the front is used to estimate an appropriate weighted sum of objectives that, on optimization, will give the next point on the front. A single objective optimization on this weighted sum then generates the actual point, which is then used to build a new local model. The method uses convexity-based heuristics to improve on mildly sub-optimal results from the optimizer and reuses cached points to improve the optimization speed and quality. We test the method on a synthetic and a 6-T SRAM power-performance tradeoff test case to demonstrate its effectiveness.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75013271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105378
Pang-Yen Chou, H. Ou, Yao-Wen Chang
Symmetry constraints and regular structures are two major considerations for expert analog layout designers. Symmetry constraints are specified to place matched modules symmetrically with respect to some common axes to reduce unwanted electrical effects. Regular structures are commonly followed by experienced designers to enhance routability and suppress parasitics induced by extra bends of wires and via cost. In this paper, we propose a heterogeneous B∗-tree representation to consider symmetry and regularity simultaneously. Corresponding moves and a new regularity cost modelling for the representation are also presented. Experimental results show that our approach can efficiently generate regularly structured placement satisfying all symmetry constraints. For example, our placer achieves a 18X runtime speedup, 28% smaller area, and 68% shorter wirelength than the previous work, based on placement results, and 60% fewer overflows, 39% fewer vias, and 86% shorter routed wirelength, based on global routing results.
{"title":"Heterogeneous B∗-trees for analog placement with symmetry and regularity considerations","authors":"Pang-Yen Chou, H. Ou, Yao-Wen Chang","doi":"10.1109/ICCAD.2011.6105378","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105378","url":null,"abstract":"Symmetry constraints and regular structures are two major considerations for expert analog layout designers. Symmetry constraints are specified to place matched modules symmetrically with respect to some common axes to reduce unwanted electrical effects. Regular structures are commonly followed by experienced designers to enhance routability and suppress parasitics induced by extra bends of wires and via cost. In this paper, we propose a heterogeneous B∗-tree representation to consider symmetry and regularity simultaneously. Corresponding moves and a new regularity cost modelling for the representation are also presented. Experimental results show that our approach can efficiently generate regularly structured placement satisfying all symmetry constraints. For example, our placer achieves a 18X runtime speedup, 28% smaller area, and 68% shorter wirelength than the previous work, based on placement results, and 60% fewer overflows, 39% fewer vias, and 86% shorter routed wirelength, based on global routing results.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73561260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2011-11-07DOI: 10.1109/ICCAD.2011.6105351
M. DeBole, Ahmed Al-Maashri, M. Cotter, Chi-Li Yu, C. Chakrabarti, N. Vijaykrishnan
Implementations of neuromorphic algorithms are traditionally implemented on platforms which consume significant power, falling short of their biologically underpinnings. Recent improvements in FPGA technology have led to FPGAs becoming a platform in which these rapidly evolving algorithms can be implemented. Unfortunately, implementing designs on FPGAs still prove challenging for nonexperts, limiting their use in the neuroscience domain. In this paper, a FPGA framework is presented which enables neuroscientists to compose multi-FPGA systems for a cortical object classification model. This is demonstrated by mapping this algorithm onto two distinct platforms providing speedups of up to ∼28X over a reference CPU implementation.
{"title":"A framework for accelerating neuromorphic-vision algorithms on FPGAs","authors":"M. DeBole, Ahmed Al-Maashri, M. Cotter, Chi-Li Yu, C. Chakrabarti, N. Vijaykrishnan","doi":"10.1109/ICCAD.2011.6105351","DOIUrl":"https://doi.org/10.1109/ICCAD.2011.6105351","url":null,"abstract":"Implementations of neuromorphic algorithms are traditionally implemented on platforms which consume significant power, falling short of their biologically underpinnings. Recent improvements in FPGA technology have led to FPGAs becoming a platform in which these rapidly evolving algorithms can be implemented. Unfortunately, implementing designs on FPGAs still prove challenging for nonexperts, limiting their use in the neuroscience domain. In this paper, a FPGA framework is presented which enables neuroscientists to compose multi-FPGA systems for a cortical object classification model. This is demonstrated by mapping this algorithm onto two distinct platforms providing speedups of up to ∼28X over a reference CPU implementation.","PeriodicalId":6357,"journal":{"name":"2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72894350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}