首页 > 最新文献

2015 28th IEEE International System-on-Chip Conference (SOCC)最新文献

英文 中文
Loop acceleration and instruction repeat support for application specific instruction-set processors 针对特定于应用程序的指令集处理器的循环加速和指令重复支持
Pub Date : 2015-09-01 DOI: 10.1109/SOCC.2015.7406957
Zhenzhi Wu, Dake Liu, Xiaoyang Li
Computation intensive tasks which consist of nested short loops usually suffer from massive control overhead, or memory size increasing when employing loop unrolling. In this approach, by introducing a modified instruction fetch unit with instruction FIFO and multiple loop controllers, loops can be performed in hardware, and single execution-cycle instructions can be executed in self-loop. Therefore no loop overhead exists for the optimized processor. The flexibility and the instruction granularity are maintained. Special domains for loop and repeat indications are added in the application-specific instructions. The proposed approach achieves dramatically performance and area benefits for many nested short loop dominated programs where the loops are determinable.
由嵌套短循环组成的计算密集型任务通常会遭受巨大的控制开销,或者在使用循环展开时增加内存大小。在这种方法中,通过引入带有指令FIFO和多循环控制器的修改指令提取单元,可以在硬件中执行循环,而在自循环中执行单执行周期的指令。因此,优化后的处理器不存在循环开销。保持了灵活性和指令粒度。在特定于应用程序的指令中添加了循环和重复指示的特殊域。对于许多嵌套短循环主导的程序,该方法在循环是可确定的情况下获得了显著的性能和面积优势。
{"title":"Loop acceleration and instruction repeat support for application specific instruction-set processors","authors":"Zhenzhi Wu, Dake Liu, Xiaoyang Li","doi":"10.1109/SOCC.2015.7406957","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406957","url":null,"abstract":"Computation intensive tasks which consist of nested short loops usually suffer from massive control overhead, or memory size increasing when employing loop unrolling. In this approach, by introducing a modified instruction fetch unit with instruction FIFO and multiple loop controllers, loops can be performed in hardware, and single execution-cycle instructions can be executed in self-loop. Therefore no loop overhead exists for the optimized processor. The flexibility and the instruction granularity are maintained. Special domains for loop and repeat indications are added in the application-specific instructions. The proposed approach achieves dramatically performance and area benefits for many nested short loop dominated programs where the loops are determinable.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125182106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Five forces shaping the silicon world: Advanced sensing and intelligence in IoT and vision 塑造硅世界的五大力量:物联网和视觉领域的先进传感和智能
Pub Date : 2015-09-01 DOI: 10.1109/SOCC.2015.7406942
C. Rowen
The cumulative improvement in digital silicon density, energy and performance has had an impressive quantitative impact on the world we live in. But new forces, embodied in radical changes in system applications, are rapidly disrupting traditional silicon architectures. In this talk we chart five of the major forces at work in silicon systems, and explore new categories of “things that sense and see”. Along the way, we visit some fundamental shifts taking place in low-energy processor cores, in vision DSPs, and in systems for “deep learning” that now exceed human capabilities.
数字硅密度、能量和性能的累积改进对我们生活的世界产生了令人印象深刻的定量影响。但是,体现在系统应用急剧变化中的新力量正在迅速颠覆传统的硅架构。在这次演讲中,我们列出了在硅系统中起作用的五种主要力量,并探索了“能感知和看到的东西”的新类别。在此过程中,我们将看到低能耗处理器核心、视觉dsp和“深度学习”系统中发生的一些根本性变化,这些变化现在已经超出了人类的能力。
{"title":"Five forces shaping the silicon world: Advanced sensing and intelligence in IoT and vision","authors":"C. Rowen","doi":"10.1109/SOCC.2015.7406942","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406942","url":null,"abstract":"The cumulative improvement in digital silicon density, energy and performance has had an impressive quantitative impact on the world we live in. But new forces, embodied in radical changes in system applications, are rapidly disrupting traditional silicon architectures. In this talk we chart five of the major forces at work in silicon systems, and explore new categories of “things that sense and see”. Along the way, we visit some fundamental shifts taking place in low-energy processor cores, in vision DSPs, and in systems for “deep learning” that now exceed human capabilities.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122011285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Symmetric write operation for 1T-1MTJ STT-RAM cells using negative bitline technique 使用负位线技术对1T-1MTJ STT-RAM单元进行对称写操作
Pub Date : 2015-09-01 DOI: 10.1109/SOCC.2015.7406948
H. Farkhani, A. Peiravi, J. K. Madsen, F. Moradi
In this paper, a new write assist technique is proposed to improve the write characteristics of 1T-1MTJ STT-RAM bitcell through a symmetric write operation. This is done by applying a negative voltage to the bitline during write `1' operation. The proposed technique is compared with the best previously proposed techniques. The simulation results using 65nm CMOS technology show that the proposed write assist technique results in 19% improvement in write energy compared to the boosted wordline technique. In addition, the proposed write assist technique leads to 12% and 48% reduction in the access transistor width compared with boosted wordline and balanced write techniques, respectively. Furthermore, the maximum voltage across the MTJ is reduced by 20% and 6% compared with boosted wordline and balanced write techniques, respectively.
本文提出了一种新的写辅助技术,通过对称写操作来改善1T-1MTJ STT-RAM位元的写特性。这是通过在写' 1'操作期间向位行施加负电压来完成的。将所提出的技术与先前提出的最佳技术进行了比较。采用65nm CMOS技术的仿真结果表明,与增强字线技术相比,所提出的写入辅助技术的写入能量提高了19%。此外,与增强字线和平衡写入技术相比,所提出的写入辅助技术可分别将接入晶体管宽度减少12%和48%。此外,与增强的文字线和平衡写入技术相比,MTJ上的最大电压分别降低了20%和6%。
{"title":"Symmetric write operation for 1T-1MTJ STT-RAM cells using negative bitline technique","authors":"H. Farkhani, A. Peiravi, J. K. Madsen, F. Moradi","doi":"10.1109/SOCC.2015.7406948","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406948","url":null,"abstract":"In this paper, a new write assist technique is proposed to improve the write characteristics of 1T-1MTJ STT-RAM bitcell through a symmetric write operation. This is done by applying a negative voltage to the bitline during write `1' operation. The proposed technique is compared with the best previously proposed techniques. The simulation results using 65nm CMOS technology show that the proposed write assist technique results in 19% improvement in write energy compared to the boosted wordline technique. In addition, the proposed write assist technique leads to 12% and 48% reduction in the access transistor width compared with boosted wordline and balanced write techniques, respectively. Furthermore, the maximum voltage across the MTJ is reduced by 20% and 6% compared with boosted wordline and balanced write techniques, respectively.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126018481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On microarchitectural modeling for CNFET-based circuits 基于cnfet电路的微结构建模研究
Pub Date : 2015-09-01 DOI: 10.1109/SOCC.2015.7406982
Tianjian Li, Hao Chen, Weikang Qian, Xiaoyao Liang, Li Jiang
Carbon Nanotube Field-Effect-Transistors (CN-FETs) show great promise to be an alternative to traditional CMOS technology, due to their extremely high energy efficiency. Unfortunately, the lack of control over the Carbon NanoTube (CNT) growth process causes CNFET circuits to suffer from the CNT count variation, which degrades the CNFET circuit performance. Compared to the CMOS process variation, the CNT count variation exhibits asymmetric spatial correlation. In this work, we propose an analytic model that integrates the impact of the asymmetric spatial correlation into the key microarchitectural blocks. We use this model to evaluate the variations in circuit performance for different layout styles and microarchitectural parameters. We further explore the opportunity of leveraging the asymmetric spatial correlation for performance enhancement. Experimental results based on SPICE simulation and architectural simulations showed the accuracy and effectiveness of the proposed model.
碳纳米管场效应晶体管(cn - fet)具有极高的能量效率,有望成为传统CMOS技术的替代品。不幸的是,由于缺乏对碳纳米管生长过程的控制,导致CNFET电路受到碳纳米管计数变化的影响,从而降低了CNFET电路的性能。与CMOS工艺变化相比,碳纳米管计数变化呈现不对称的空间相关性。在这项工作中,我们提出了一个分析模型,该模型将不对称空间相关性的影响整合到关键的微建筑块中。我们使用该模型来评估不同布局风格和微结构参数下电路性能的变化。我们进一步探讨了利用不对称空间相关性来提高性能的机会。基于SPICE仿真和体系结构仿真的实验结果表明了该模型的准确性和有效性。
{"title":"On microarchitectural modeling for CNFET-based circuits","authors":"Tianjian Li, Hao Chen, Weikang Qian, Xiaoyao Liang, Li Jiang","doi":"10.1109/SOCC.2015.7406982","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406982","url":null,"abstract":"Carbon Nanotube Field-Effect-Transistors (CN-FETs) show great promise to be an alternative to traditional CMOS technology, due to their extremely high energy efficiency. Unfortunately, the lack of control over the Carbon NanoTube (CNT) growth process causes CNFET circuits to suffer from the CNT count variation, which degrades the CNFET circuit performance. Compared to the CMOS process variation, the CNT count variation exhibits asymmetric spatial correlation. In this work, we propose an analytic model that integrates the impact of the asymmetric spatial correlation into the key microarchitectural blocks. We use this model to evaluate the variations in circuit performance for different layout styles and microarchitectural parameters. We further explore the opportunity of leveraging the asymmetric spatial correlation for performance enhancement. Experimental results based on SPICE simulation and architectural simulations showed the accuracy and effectiveness of the proposed model.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116256278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An A-SAR ADC circuit with adaptive auxiliary comparison scheme 具有自适应辅助比较方案的A-SAR ADC电路
Pub Date : 2015-09-01 DOI: 10.1109/SOCC.2015.7406939
Suresh Koyada, Abhilash Karnatakam Nagabhushana, Stefan Leitner, Haibo Wang
This paper extends the accelerated-SAR (A-SAR) technique, which was previously implemented in a Voltage-to-Time (VTC) based ADC circuit, to the mainstream voltage comparison based ADC circuits. In the design of VTC-based A-SAR ADC circuits, the levels for auxiliary comparison can be easily generated. However, it is more complicated to produce such auxiliary levels in the voltage comparison based circuits. Techniques to cope with this design challenge are discussed in the paper. In addition, this work further enhances the efficiency of the A-SAR technique by introducing adaptive auxiliary level selection. System-level simulations show that the proposed adaptive auxiliary level selection method significantly outperforms the previous approach that uses fixed auxiliary levels. Circuit techniques to implement the adaptive methods are also presented in the paper. The proposed method and developed circuit techniques are implemented in 10-bit ADC circuits. The performance of the A-SAR ADC is compared with a conventional SAR ADC and the comparison demonstrates the benefits of the proposed techniques.
本文将先前在基于电压时间(VTC)的ADC电路中实现的加速sar (a - sar)技术扩展到基于电压比较的主流ADC电路中。在基于vtc的A-SAR ADC电路设计中,可以方便地生成辅助比较的电平。然而,在基于电压比较的电路中产生这种辅助电平是比较复杂的。本文讨论了应对这一设计挑战的技术。此外,通过引入自适应辅助电平选择,进一步提高了A-SAR技术的效率。系统级仿真表明,所提出的自适应辅助电平选择方法明显优于先前使用固定辅助电平的方法。本文还介绍了实现自适应方法的电路技术。所提出的方法和所开发的电路技术在10位ADC电路中实现。将a -SAR ADC的性能与传统的SAR ADC进行了比较,结果表明了所提出技术的优点。
{"title":"An A-SAR ADC circuit with adaptive auxiliary comparison scheme","authors":"Suresh Koyada, Abhilash Karnatakam Nagabhushana, Stefan Leitner, Haibo Wang","doi":"10.1109/SOCC.2015.7406939","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406939","url":null,"abstract":"This paper extends the accelerated-SAR (A-SAR) technique, which was previously implemented in a Voltage-to-Time (VTC) based ADC circuit, to the mainstream voltage comparison based ADC circuits. In the design of VTC-based A-SAR ADC circuits, the levels for auxiliary comparison can be easily generated. However, it is more complicated to produce such auxiliary levels in the voltage comparison based circuits. Techniques to cope with this design challenge are discussed in the paper. In addition, this work further enhances the efficiency of the A-SAR technique by introducing adaptive auxiliary level selection. System-level simulations show that the proposed adaptive auxiliary level selection method significantly outperforms the previous approach that uses fixed auxiliary levels. Circuit techniques to implement the adaptive methods are also presented in the paper. The proposed method and developed circuit techniques are implemented in 10-bit ADC circuits. The performance of the A-SAR ADC is compared with a conventional SAR ADC and the comparison demonstrates the benefits of the proposed techniques.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116233465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
"Venice: A cost-effective architecture for datacenter servers" “Venice:数据中心服务器的经济高效架构”
Pub Date : 2015-09-01 DOI: 10.1109/SOCC.2015.7406895
Rui Hou
Dr. Rui Hou is VP, Processor Design, of Suzhou PowerCore Technology. He received his Bachelor's and Master's degree in computer science from Harbin Institute of Technology in 1999 and 2003 respectively, and earned his Ph.D in computer science from the Institute of Computing Technology of the Chinese Academy of Sciences in 2007. His main research interests are in the areas of data center systems and high-performance CPUs. Dr. Hou is currently leading a team to develop a high performance server processor based on IBM's Power technology. He has led the design and development of an ARMv8 based many-core processor with a brand-new SMT-4 core that his team designed from the scratch. He has built prototypes systems enabling efficient resource sharing and high throughput computing inside the data centers. Dr. Hou is also an associate professor at Institute of Computing Technology. Before joining ICT in 2011, he had been working at IBM China Research Lab for four years. He has published over 20 peer-reviewed papers in various international conferences and journals, and filed more than 50 patent applications.
侯睿博士是苏州博芯科技有限公司处理器设计副总裁。他分别于1999年和2003年获得哈尔滨工业大学计算机科学学士和硕士学位,并于2007年获得中国科学院计算技术研究所计算机科学博士学位。主要研究方向为数据中心系统和高性能cpu。他目前领导一个团队开发基于IBM Power技术的高性能服务器处理器。他领导了基于ARMv8的多核处理器的设计和开发,该处理器采用了全新的SMT-4内核,他的团队从头开始设计。他已经建立了原型系统,可以在数据中心内实现有效的资源共享和高吞吐量计算。他也是计算技术研究所的副教授。在2011年加入ICT之前,他曾在IBM中国研究实验室工作了四年。他在各种国际会议和期刊上发表了20多篇同行评审论文,并提交了50多项专利申请。
{"title":"\"Venice: A cost-effective architecture for datacenter servers\"","authors":"Rui Hou","doi":"10.1109/SOCC.2015.7406895","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406895","url":null,"abstract":"Dr. Rui Hou is VP, Processor Design, of Suzhou PowerCore Technology. He received his Bachelor's and Master's degree in computer science from Harbin Institute of Technology in 1999 and 2003 respectively, and earned his Ph.D in computer science from the Institute of Computing Technology of the Chinese Academy of Sciences in 2007. His main research interests are in the areas of data center systems and high-performance CPUs. Dr. Hou is currently leading a team to develop a high performance server processor based on IBM's Power technology. He has led the design and development of an ARMv8 based many-core processor with a brand-new SMT-4 core that his team designed from the scratch. He has built prototypes systems enabling efficient resource sharing and high throughput computing inside the data centers. Dr. Hou is also an associate professor at Institute of Computing Technology. Before joining ICT in 2011, he had been working at IBM China Research Lab for four years. He has published over 20 peer-reviewed papers in various international conferences and journals, and filed more than 50 patent applications.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130217750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A digital background calibration technique for split DAC based SAR ADC by using redundant cycle 基于冗余周期的分割DAC型SAR ADC数字背景标定技术
Pub Date : 2015-09-01 DOI: 10.1109/SOCC.2015.7406952
Wuguang Wang, R. Huang, Guoquan Sun, Weijun Mao, Xiaolei Zhu
A digital background calibration technique for split CDAC mismatch is proposed. It uses the dummy capacitor to generate an extra calibration bit. The mismatch of the CDAC array is detected by the calibration bit and fed back to the compensation capacitor. A 9b 100MS/s SAR ADC is demonstrated in standard 65nm CMOS technology. Simulation results show that the DNL and INL can be decreased to ±0.1 LSB and +0.11/-0.13 LSB, respectively, after using this technique. The proposed calibration block consumes only 50μw from a 1.2V supply.
提出了一种分路CDAC失配数字背景标定技术。它使用假电容来产生一个额外的校准位。CDAC阵列的失配由校准位检测并反馈给补偿电容。在标准的65nm CMOS技术下演示了9b 100MS/s SAR ADC。仿真结果表明,采用该技术后,DNL和INL分别可降至±0.1 LSB和+0.11/-0.13 LSB。所提出的校准块仅消耗50μw的1.2V电源。
{"title":"A digital background calibration technique for split DAC based SAR ADC by using redundant cycle","authors":"Wuguang Wang, R. Huang, Guoquan Sun, Weijun Mao, Xiaolei Zhu","doi":"10.1109/SOCC.2015.7406952","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406952","url":null,"abstract":"A digital background calibration technique for split CDAC mismatch is proposed. It uses the dummy capacitor to generate an extra calibration bit. The mismatch of the CDAC array is detected by the calibration bit and fed back to the compensation capacitor. A 9b 100MS/s SAR ADC is demonstrated in standard 65nm CMOS technology. Simulation results show that the DNL and INL can be decreased to ±0.1 LSB and +0.11/-0.13 LSB, respectively, after using this technique. The proposed calibration block consumes only 50μw from a 1.2V supply.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125298263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An accelerator for classification using radial basis function neural network 基于径向基函数神经网络的分类加速器
Pub Date : 2015-09-01 DOI: 10.1109/SOCC.2015.7406928
M. Mohammadi, Rohit Ronge, J. Chandiramani, S. Nandy
A scalable and reconfigurable architecture for accelerating classification using Radial Basis Function Neural Network (RBFNN) is presented in this paper. The proposed accelerator comprises a set of interconnected HyperCells, which serve as the reconfigurable datapath on which the RBFNN is realized. The dimensions of RBFNN that can be supported on implemented design is limited due to the fixed number of HyperCells. To resolve this limitation, a folding strategy is discussed which provides a generic hardware solution for classification using RBFNN, with no constraint on the dimensions of inputs and outputs. The performance of RBFNN implemented on network of HyperCells using Xilinx Virtex 7 XC7V2000T as target FPGA is compared with software implementation and GPU implementation of RBFNN. Our results show speed up of 1.91X-15.94X over equivalent software implementation on Intel Core 2 Quad and 1.33X-14.6X over GPU (NVIDIA GTX650).
提出了一种可扩展、可重构的径向基函数神经网络(RBFNN)加速分类体系结构。该加速器由一组相互连接的hypercell组成,这些hypercell作为RBFNN实现的可重构数据路径。由于hypercell的数量固定,RBFNN在实现设计上可以支持的维度受到限制。为了解决这一限制,讨论了一种折叠策略,该策略提供了使用RBFNN进行分类的通用硬件解决方案,对输入和输出的维度没有约束。以Xilinx Virtex 7 XC7V2000T为目标FPGA,在HyperCells网络上实现了RBFNN,并将其性能与软件实现和GPU实现进行了比较。我们的结果显示,在英特尔酷睿2 Quad上实现的速度为1.91X-15.94X,在GPU (NVIDIA GTX650)上实现的速度为1.33X-14.6X。
{"title":"An accelerator for classification using radial basis function neural network","authors":"M. Mohammadi, Rohit Ronge, J. Chandiramani, S. Nandy","doi":"10.1109/SOCC.2015.7406928","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406928","url":null,"abstract":"A scalable and reconfigurable architecture for accelerating classification using Radial Basis Function Neural Network (RBFNN) is presented in this paper. The proposed accelerator comprises a set of interconnected HyperCells, which serve as the reconfigurable datapath on which the RBFNN is realized. The dimensions of RBFNN that can be supported on implemented design is limited due to the fixed number of HyperCells. To resolve this limitation, a folding strategy is discussed which provides a generic hardware solution for classification using RBFNN, with no constraint on the dimensions of inputs and outputs. The performance of RBFNN implemented on network of HyperCells using Xilinx Virtex 7 XC7V2000T as target FPGA is compared with software implementation and GPU implementation of RBFNN. Our results show speed up of 1.91X-15.94X over equivalent software implementation on Intel Core 2 Quad and 1.33X-14.6X over GPU (NVIDIA GTX650).","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"112 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120904850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Low-voltage 9T FinFETSRAM cell for low-power applications 用于低功耗应用的低压9T finfesram单元
Pub Date : 2015-09-01 DOI: 10.1109/SOCC.2015.7406929
F. Moradi, Mohammad Tohidi
In this paper, a novel multi-threshold 9T-SRAM cell using FinFET technology with improved read and write margins in comparison with the standard 6T-SRAM cell is proposed. By the use of this bit-cell at supply voltage of 200mV (800mV), read and write margins are improved by 92% (97%) and 2X (40%), respectively. The proposed design operates at supply voltages lower than 300mV that results in a 3X lower power consumption compared to the standard 6T-SRAM cell.
本文提出了一种采用FinFET技术的新型多阈值9T-SRAM单元,与标准6T-SRAM单元相比,该单元具有更高的读写裕度。在供电电压为200mV (800mV)时使用该位单元,读写余量分别提高92%(97%)和2X(40%)。所提出的设计工作在低于300mV的电源电压下,与标准6T-SRAM电池相比,功耗降低了3倍。
{"title":"Low-voltage 9T FinFETSRAM cell for low-power applications","authors":"F. Moradi, Mohammad Tohidi","doi":"10.1109/SOCC.2015.7406929","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406929","url":null,"abstract":"In this paper, a novel multi-threshold 9T-SRAM cell using FinFET technology with improved read and write margins in comparison with the standard 6T-SRAM cell is proposed. By the use of this bit-cell at supply voltage of 200mV (800mV), read and write margins are improved by 92% (97%) and 2X (40%), respectively. The proposed design operates at supply voltages lower than 300mV that results in a 3X lower power consumption compared to the standard 6T-SRAM cell.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132562273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A high throughput router with a novel switch allocator for network on chip 一种高吞吐量路由器,采用一种新型的片上网络开关分配器
Pub Date : 2015-09-01 DOI: 10.1109/SOCC.2015.7406932
P. Yan, Shixiong Jiang, R. Sridhar
As industry moves towards many core chips, conventional bus and crossbar interconnections often struggle to meet the multi-core communication requirement. Network on Chip (NoC) has been proposed to replace global interconnections to alleviate this problem. In NoC, routers are used to exchange data between IPs. So the router performance directly impacts the efficiency of the entire system. The key components of a modern router include Route Computation (RC), Virtual-channel Allocation (VA), Switch Allocation (SA) and Switch Traversal (ST). In this paper, we present a new router architecture that significantly improves the throughput while keeping the area overhead low. In this approach, we redesign SA's fist stage arbiters to be priority based dynamic arbiters using round-robin algorithm. The modified unit can increase the possibility of SA's first stage arbiters to choose requests for different output ports. Hence, in the second stage of the SA, the competition for output ports will be reduced, leading more flits to travel through the crossbar in one cycle, resulting in increased throughput. Our results show that the new design can improve throughput by up to 13% for a router with eight virtual channels. Also, the new arbiter has lower worst case latency which can help the system to increase its operational frequency.
随着工业向多核心芯片发展,传统的总线和交叉互连往往难以满足多核心通信需求。片上网络(NoC)已被提出取代全局互连来缓解这一问题。在NoC中,路由器用于在ip之间交换数据。因此,路由器的性能直接影响到整个系统的效率。现代路由器的关键组成部分包括路由计算(RC)、虚拟通道分配(VA)、交换机分配(SA)和交换机遍历(ST)。在本文中,我们提出了一种新的路由器架构,可以显着提高吞吐量,同时保持较低的区域开销。在这种方法中,我们使用轮循算法将SA的第一阶段仲裁器重新设计为基于优先级的动态仲裁器。修改后的单元可以增加SA的第一阶段仲裁器为不同输出端口选择请求的可能性。因此,在SA的第二阶段,对输出端口的竞争将减少,导致更多的航班在一个周期内通过横杆,从而增加吞吐量。我们的结果表明,对于具有8个虚拟通道的路由器,新设计可以将吞吐量提高13%。此外,新的仲裁器具有较低的最坏情况延迟,可以帮助系统提高其工作频率。
{"title":"A high throughput router with a novel switch allocator for network on chip","authors":"P. Yan, Shixiong Jiang, R. Sridhar","doi":"10.1109/SOCC.2015.7406932","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406932","url":null,"abstract":"As industry moves towards many core chips, conventional bus and crossbar interconnections often struggle to meet the multi-core communication requirement. Network on Chip (NoC) has been proposed to replace global interconnections to alleviate this problem. In NoC, routers are used to exchange data between IPs. So the router performance directly impacts the efficiency of the entire system. The key components of a modern router include Route Computation (RC), Virtual-channel Allocation (VA), Switch Allocation (SA) and Switch Traversal (ST). In this paper, we present a new router architecture that significantly improves the throughput while keeping the area overhead low. In this approach, we redesign SA's fist stage arbiters to be priority based dynamic arbiters using round-robin algorithm. The modified unit can increase the possibility of SA's first stage arbiters to choose requests for different output ports. Hence, in the second stage of the SA, the competition for output ports will be reduced, leading more flits to travel through the crossbar in one cycle, resulting in increased throughput. Our results show that the new design can improve throughput by up to 13% for a router with eight virtual channels. Also, the new arbiter has lower worst case latency which can help the system to increase its operational frequency.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129346917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
2015 28th IEEE International System-on-Chip Conference (SOCC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1