Pub Date : 2015-09-01DOI: 10.1109/SOCC.2015.7406957
Zhenzhi Wu, Dake Liu, Xiaoyang Li
Computation intensive tasks which consist of nested short loops usually suffer from massive control overhead, or memory size increasing when employing loop unrolling. In this approach, by introducing a modified instruction fetch unit with instruction FIFO and multiple loop controllers, loops can be performed in hardware, and single execution-cycle instructions can be executed in self-loop. Therefore no loop overhead exists for the optimized processor. The flexibility and the instruction granularity are maintained. Special domains for loop and repeat indications are added in the application-specific instructions. The proposed approach achieves dramatically performance and area benefits for many nested short loop dominated programs where the loops are determinable.
{"title":"Loop acceleration and instruction repeat support for application specific instruction-set processors","authors":"Zhenzhi Wu, Dake Liu, Xiaoyang Li","doi":"10.1109/SOCC.2015.7406957","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406957","url":null,"abstract":"Computation intensive tasks which consist of nested short loops usually suffer from massive control overhead, or memory size increasing when employing loop unrolling. In this approach, by introducing a modified instruction fetch unit with instruction FIFO and multiple loop controllers, loops can be performed in hardware, and single execution-cycle instructions can be executed in self-loop. Therefore no loop overhead exists for the optimized processor. The flexibility and the instruction granularity are maintained. Special domains for loop and repeat indications are added in the application-specific instructions. The proposed approach achieves dramatically performance and area benefits for many nested short loop dominated programs where the loops are determinable.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125182106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1109/SOCC.2015.7406942
C. Rowen
The cumulative improvement in digital silicon density, energy and performance has had an impressive quantitative impact on the world we live in. But new forces, embodied in radical changes in system applications, are rapidly disrupting traditional silicon architectures. In this talk we chart five of the major forces at work in silicon systems, and explore new categories of “things that sense and see”. Along the way, we visit some fundamental shifts taking place in low-energy processor cores, in vision DSPs, and in systems for “deep learning” that now exceed human capabilities.
{"title":"Five forces shaping the silicon world: Advanced sensing and intelligence in IoT and vision","authors":"C. Rowen","doi":"10.1109/SOCC.2015.7406942","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406942","url":null,"abstract":"The cumulative improvement in digital silicon density, energy and performance has had an impressive quantitative impact on the world we live in. But new forces, embodied in radical changes in system applications, are rapidly disrupting traditional silicon architectures. In this talk we chart five of the major forces at work in silicon systems, and explore new categories of “things that sense and see”. Along the way, we visit some fundamental shifts taking place in low-energy processor cores, in vision DSPs, and in systems for “deep learning” that now exceed human capabilities.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122011285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1109/SOCC.2015.7406948
H. Farkhani, A. Peiravi, J. K. Madsen, F. Moradi
In this paper, a new write assist technique is proposed to improve the write characteristics of 1T-1MTJ STT-RAM bitcell through a symmetric write operation. This is done by applying a negative voltage to the bitline during write `1' operation. The proposed technique is compared with the best previously proposed techniques. The simulation results using 65nm CMOS technology show that the proposed write assist technique results in 19% improvement in write energy compared to the boosted wordline technique. In addition, the proposed write assist technique leads to 12% and 48% reduction in the access transistor width compared with boosted wordline and balanced write techniques, respectively. Furthermore, the maximum voltage across the MTJ is reduced by 20% and 6% compared with boosted wordline and balanced write techniques, respectively.
{"title":"Symmetric write operation for 1T-1MTJ STT-RAM cells using negative bitline technique","authors":"H. Farkhani, A. Peiravi, J. K. Madsen, F. Moradi","doi":"10.1109/SOCC.2015.7406948","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406948","url":null,"abstract":"In this paper, a new write assist technique is proposed to improve the write characteristics of 1T-1MTJ STT-RAM bitcell through a symmetric write operation. This is done by applying a negative voltage to the bitline during write `1' operation. The proposed technique is compared with the best previously proposed techniques. The simulation results using 65nm CMOS technology show that the proposed write assist technique results in 19% improvement in write energy compared to the boosted wordline technique. In addition, the proposed write assist technique leads to 12% and 48% reduction in the access transistor width compared with boosted wordline and balanced write techniques, respectively. Furthermore, the maximum voltage across the MTJ is reduced by 20% and 6% compared with boosted wordline and balanced write techniques, respectively.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126018481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1109/SOCC.2015.7406982
Tianjian Li, Hao Chen, Weikang Qian, Xiaoyao Liang, Li Jiang
Carbon Nanotube Field-Effect-Transistors (CN-FETs) show great promise to be an alternative to traditional CMOS technology, due to their extremely high energy efficiency. Unfortunately, the lack of control over the Carbon NanoTube (CNT) growth process causes CNFET circuits to suffer from the CNT count variation, which degrades the CNFET circuit performance. Compared to the CMOS process variation, the CNT count variation exhibits asymmetric spatial correlation. In this work, we propose an analytic model that integrates the impact of the asymmetric spatial correlation into the key microarchitectural blocks. We use this model to evaluate the variations in circuit performance for different layout styles and microarchitectural parameters. We further explore the opportunity of leveraging the asymmetric spatial correlation for performance enhancement. Experimental results based on SPICE simulation and architectural simulations showed the accuracy and effectiveness of the proposed model.
{"title":"On microarchitectural modeling for CNFET-based circuits","authors":"Tianjian Li, Hao Chen, Weikang Qian, Xiaoyao Liang, Li Jiang","doi":"10.1109/SOCC.2015.7406982","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406982","url":null,"abstract":"Carbon Nanotube Field-Effect-Transistors (CN-FETs) show great promise to be an alternative to traditional CMOS technology, due to their extremely high energy efficiency. Unfortunately, the lack of control over the Carbon NanoTube (CNT) growth process causes CNFET circuits to suffer from the CNT count variation, which degrades the CNFET circuit performance. Compared to the CMOS process variation, the CNT count variation exhibits asymmetric spatial correlation. In this work, we propose an analytic model that integrates the impact of the asymmetric spatial correlation into the key microarchitectural blocks. We use this model to evaluate the variations in circuit performance for different layout styles and microarchitectural parameters. We further explore the opportunity of leveraging the asymmetric spatial correlation for performance enhancement. Experimental results based on SPICE simulation and architectural simulations showed the accuracy and effectiveness of the proposed model.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116256278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1109/SOCC.2015.7406939
Suresh Koyada, Abhilash Karnatakam Nagabhushana, Stefan Leitner, Haibo Wang
This paper extends the accelerated-SAR (A-SAR) technique, which was previously implemented in a Voltage-to-Time (VTC) based ADC circuit, to the mainstream voltage comparison based ADC circuits. In the design of VTC-based A-SAR ADC circuits, the levels for auxiliary comparison can be easily generated. However, it is more complicated to produce such auxiliary levels in the voltage comparison based circuits. Techniques to cope with this design challenge are discussed in the paper. In addition, this work further enhances the efficiency of the A-SAR technique by introducing adaptive auxiliary level selection. System-level simulations show that the proposed adaptive auxiliary level selection method significantly outperforms the previous approach that uses fixed auxiliary levels. Circuit techniques to implement the adaptive methods are also presented in the paper. The proposed method and developed circuit techniques are implemented in 10-bit ADC circuits. The performance of the A-SAR ADC is compared with a conventional SAR ADC and the comparison demonstrates the benefits of the proposed techniques.
本文将先前在基于电压时间(VTC)的ADC电路中实现的加速sar (a - sar)技术扩展到基于电压比较的主流ADC电路中。在基于vtc的A-SAR ADC电路设计中,可以方便地生成辅助比较的电平。然而,在基于电压比较的电路中产生这种辅助电平是比较复杂的。本文讨论了应对这一设计挑战的技术。此外,通过引入自适应辅助电平选择,进一步提高了A-SAR技术的效率。系统级仿真表明,所提出的自适应辅助电平选择方法明显优于先前使用固定辅助电平的方法。本文还介绍了实现自适应方法的电路技术。所提出的方法和所开发的电路技术在10位ADC电路中实现。将a -SAR ADC的性能与传统的SAR ADC进行了比较,结果表明了所提出技术的优点。
{"title":"An A-SAR ADC circuit with adaptive auxiliary comparison scheme","authors":"Suresh Koyada, Abhilash Karnatakam Nagabhushana, Stefan Leitner, Haibo Wang","doi":"10.1109/SOCC.2015.7406939","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406939","url":null,"abstract":"This paper extends the accelerated-SAR (A-SAR) technique, which was previously implemented in a Voltage-to-Time (VTC) based ADC circuit, to the mainstream voltage comparison based ADC circuits. In the design of VTC-based A-SAR ADC circuits, the levels for auxiliary comparison can be easily generated. However, it is more complicated to produce such auxiliary levels in the voltage comparison based circuits. Techniques to cope with this design challenge are discussed in the paper. In addition, this work further enhances the efficiency of the A-SAR technique by introducing adaptive auxiliary level selection. System-level simulations show that the proposed adaptive auxiliary level selection method significantly outperforms the previous approach that uses fixed auxiliary levels. Circuit techniques to implement the adaptive methods are also presented in the paper. The proposed method and developed circuit techniques are implemented in 10-bit ADC circuits. The performance of the A-SAR ADC is compared with a conventional SAR ADC and the comparison demonstrates the benefits of the proposed techniques.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116233465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1109/SOCC.2015.7406895
Rui Hou
Dr. Rui Hou is VP, Processor Design, of Suzhou PowerCore Technology. He received his Bachelor's and Master's degree in computer science from Harbin Institute of Technology in 1999 and 2003 respectively, and earned his Ph.D in computer science from the Institute of Computing Technology of the Chinese Academy of Sciences in 2007. His main research interests are in the areas of data center systems and high-performance CPUs. Dr. Hou is currently leading a team to develop a high performance server processor based on IBM's Power technology. He has led the design and development of an ARMv8 based many-core processor with a brand-new SMT-4 core that his team designed from the scratch. He has built prototypes systems enabling efficient resource sharing and high throughput computing inside the data centers. Dr. Hou is also an associate professor at Institute of Computing Technology. Before joining ICT in 2011, he had been working at IBM China Research Lab for four years. He has published over 20 peer-reviewed papers in various international conferences and journals, and filed more than 50 patent applications.
{"title":"\"Venice: A cost-effective architecture for datacenter servers\"","authors":"Rui Hou","doi":"10.1109/SOCC.2015.7406895","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406895","url":null,"abstract":"Dr. Rui Hou is VP, Processor Design, of Suzhou PowerCore Technology. He received his Bachelor's and Master's degree in computer science from Harbin Institute of Technology in 1999 and 2003 respectively, and earned his Ph.D in computer science from the Institute of Computing Technology of the Chinese Academy of Sciences in 2007. His main research interests are in the areas of data center systems and high-performance CPUs. Dr. Hou is currently leading a team to develop a high performance server processor based on IBM's Power technology. He has led the design and development of an ARMv8 based many-core processor with a brand-new SMT-4 core that his team designed from the scratch. He has built prototypes systems enabling efficient resource sharing and high throughput computing inside the data centers. Dr. Hou is also an associate professor at Institute of Computing Technology. Before joining ICT in 2011, he had been working at IBM China Research Lab for four years. He has published over 20 peer-reviewed papers in various international conferences and journals, and filed more than 50 patent applications.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130217750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1109/SOCC.2015.7406952
Wuguang Wang, R. Huang, Guoquan Sun, Weijun Mao, Xiaolei Zhu
A digital background calibration technique for split CDAC mismatch is proposed. It uses the dummy capacitor to generate an extra calibration bit. The mismatch of the CDAC array is detected by the calibration bit and fed back to the compensation capacitor. A 9b 100MS/s SAR ADC is demonstrated in standard 65nm CMOS technology. Simulation results show that the DNL and INL can be decreased to ±0.1 LSB and +0.11/-0.13 LSB, respectively, after using this technique. The proposed calibration block consumes only 50μw from a 1.2V supply.
提出了一种分路CDAC失配数字背景标定技术。它使用假电容来产生一个额外的校准位。CDAC阵列的失配由校准位检测并反馈给补偿电容。在标准的65nm CMOS技术下演示了9b 100MS/s SAR ADC。仿真结果表明,采用该技术后,DNL和INL分别可降至±0.1 LSB和+0.11/-0.13 LSB。所提出的校准块仅消耗50μw的1.2V电源。
{"title":"A digital background calibration technique for split DAC based SAR ADC by using redundant cycle","authors":"Wuguang Wang, R. Huang, Guoquan Sun, Weijun Mao, Xiaolei Zhu","doi":"10.1109/SOCC.2015.7406952","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406952","url":null,"abstract":"A digital background calibration technique for split CDAC mismatch is proposed. It uses the dummy capacitor to generate an extra calibration bit. The mismatch of the CDAC array is detected by the calibration bit and fed back to the compensation capacitor. A 9b 100MS/s SAR ADC is demonstrated in standard 65nm CMOS technology. Simulation results show that the DNL and INL can be decreased to ±0.1 LSB and +0.11/-0.13 LSB, respectively, after using this technique. The proposed calibration block consumes only 50μw from a 1.2V supply.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125298263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1109/SOCC.2015.7406928
M. Mohammadi, Rohit Ronge, J. Chandiramani, S. Nandy
A scalable and reconfigurable architecture for accelerating classification using Radial Basis Function Neural Network (RBFNN) is presented in this paper. The proposed accelerator comprises a set of interconnected HyperCells, which serve as the reconfigurable datapath on which the RBFNN is realized. The dimensions of RBFNN that can be supported on implemented design is limited due to the fixed number of HyperCells. To resolve this limitation, a folding strategy is discussed which provides a generic hardware solution for classification using RBFNN, with no constraint on the dimensions of inputs and outputs. The performance of RBFNN implemented on network of HyperCells using Xilinx Virtex 7 XC7V2000T as target FPGA is compared with software implementation and GPU implementation of RBFNN. Our results show speed up of 1.91X-15.94X over equivalent software implementation on Intel Core 2 Quad and 1.33X-14.6X over GPU (NVIDIA GTX650).
{"title":"An accelerator for classification using radial basis function neural network","authors":"M. Mohammadi, Rohit Ronge, J. Chandiramani, S. Nandy","doi":"10.1109/SOCC.2015.7406928","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406928","url":null,"abstract":"A scalable and reconfigurable architecture for accelerating classification using Radial Basis Function Neural Network (RBFNN) is presented in this paper. The proposed accelerator comprises a set of interconnected HyperCells, which serve as the reconfigurable datapath on which the RBFNN is realized. The dimensions of RBFNN that can be supported on implemented design is limited due to the fixed number of HyperCells. To resolve this limitation, a folding strategy is discussed which provides a generic hardware solution for classification using RBFNN, with no constraint on the dimensions of inputs and outputs. The performance of RBFNN implemented on network of HyperCells using Xilinx Virtex 7 XC7V2000T as target FPGA is compared with software implementation and GPU implementation of RBFNN. Our results show speed up of 1.91X-15.94X over equivalent software implementation on Intel Core 2 Quad and 1.33X-14.6X over GPU (NVIDIA GTX650).","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"112 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120904850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1109/SOCC.2015.7406929
F. Moradi, Mohammad Tohidi
In this paper, a novel multi-threshold 9T-SRAM cell using FinFET technology with improved read and write margins in comparison with the standard 6T-SRAM cell is proposed. By the use of this bit-cell at supply voltage of 200mV (800mV), read and write margins are improved by 92% (97%) and 2X (40%), respectively. The proposed design operates at supply voltages lower than 300mV that results in a 3X lower power consumption compared to the standard 6T-SRAM cell.
{"title":"Low-voltage 9T FinFETSRAM cell for low-power applications","authors":"F. Moradi, Mohammad Tohidi","doi":"10.1109/SOCC.2015.7406929","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406929","url":null,"abstract":"In this paper, a novel multi-threshold 9T-SRAM cell using FinFET technology with improved read and write margins in comparison with the standard 6T-SRAM cell is proposed. By the use of this bit-cell at supply voltage of 200mV (800mV), read and write margins are improved by 92% (97%) and 2X (40%), respectively. The proposed design operates at supply voltages lower than 300mV that results in a 3X lower power consumption compared to the standard 6T-SRAM cell.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132562273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2015-09-01DOI: 10.1109/SOCC.2015.7406932
P. Yan, Shixiong Jiang, R. Sridhar
As industry moves towards many core chips, conventional bus and crossbar interconnections often struggle to meet the multi-core communication requirement. Network on Chip (NoC) has been proposed to replace global interconnections to alleviate this problem. In NoC, routers are used to exchange data between IPs. So the router performance directly impacts the efficiency of the entire system. The key components of a modern router include Route Computation (RC), Virtual-channel Allocation (VA), Switch Allocation (SA) and Switch Traversal (ST). In this paper, we present a new router architecture that significantly improves the throughput while keeping the area overhead low. In this approach, we redesign SA's fist stage arbiters to be priority based dynamic arbiters using round-robin algorithm. The modified unit can increase the possibility of SA's first stage arbiters to choose requests for different output ports. Hence, in the second stage of the SA, the competition for output ports will be reduced, leading more flits to travel through the crossbar in one cycle, resulting in increased throughput. Our results show that the new design can improve throughput by up to 13% for a router with eight virtual channels. Also, the new arbiter has lower worst case latency which can help the system to increase its operational frequency.
{"title":"A high throughput router with a novel switch allocator for network on chip","authors":"P. Yan, Shixiong Jiang, R. Sridhar","doi":"10.1109/SOCC.2015.7406932","DOIUrl":"https://doi.org/10.1109/SOCC.2015.7406932","url":null,"abstract":"As industry moves towards many core chips, conventional bus and crossbar interconnections often struggle to meet the multi-core communication requirement. Network on Chip (NoC) has been proposed to replace global interconnections to alleviate this problem. In NoC, routers are used to exchange data between IPs. So the router performance directly impacts the efficiency of the entire system. The key components of a modern router include Route Computation (RC), Virtual-channel Allocation (VA), Switch Allocation (SA) and Switch Traversal (ST). In this paper, we present a new router architecture that significantly improves the throughput while keeping the area overhead low. In this approach, we redesign SA's fist stage arbiters to be priority based dynamic arbiters using round-robin algorithm. The modified unit can increase the possibility of SA's first stage arbiters to choose requests for different output ports. Hence, in the second stage of the SA, the competition for output ports will be reduced, leading more flits to travel through the crossbar in one cycle, resulting in increased throughput. Our results show that the new design can improve throughput by up to 13% for a router with eight virtual channels. Also, the new arbiter has lower worst case latency which can help the system to increase its operational frequency.","PeriodicalId":329464,"journal":{"name":"2015 28th IEEE International System-on-Chip Conference (SOCC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129346917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}