首页 > 最新文献

2014 IEEE Computer Society Annual Symposium on VLSI最新文献

英文 中文
Data Correlation Aware Serial Encoding for Low Switching Power On-Chip Communication 低开关功耗片上通信的数据相关感知串行编码
Pub Date : 2014-07-09 DOI: 10.1109/ISVLSI.2014.48
Somrita Ghosh, P. Ghosal, Nabanita Das, S. Mohanty, Oghenekarho Okobiah
Achieving lightning fast speed data communication in Chip Multi Processor (CMP) based systems as well as Networkon Chips (NoCs) is always desired for target performance. Data communication links inside the communication fabric of CMP or NoC architectures have strong impact on their performance and power dissipation. Several approaches exist to reduce power dissipation of parallel link on-chip interconnects, a very few techniques are reported for power reduction in serial links. The existing serial-link power reduction techniques don't necessarily account correlation exhibited in the data and hence are limited in terms of accuracy. In this paper, a novel data encoding scheme isproposed for serial links to decrease the number of self transitions to reduce the power in data transmission. The proposed scheme accounts the correlations in the data and hence is more effective for real-life applications. The system architecture as well as the encoding and decoding schemes have been implemented to explore the proposed algorithm applicable for any CMP or NoC architectures. The proposed encoding scheme has been analyzed with various types of real-life data streams. Experimental resultsshow that up to 27% reduction in power dissipation is possible in NoC links by the proposed scheme.
在基于芯片多处理器(CMP)的系统以及网络芯片(noc)中实现闪电般的高速数据通信一直是目标性能的期望。CMP或NoC架构通信结构内部的数据通信链路对其性能和功耗有很大的影响。目前有几种方法可以降低片上并行链路互连的功耗,但很少有技术报道可以降低串行链路的功耗。现有的串行链路功率降低技术不一定考虑数据中显示的相关性,因此在精度方面受到限制。本文提出了一种新的串行链路数据编码方案,以减少自转换次数,从而降低数据传输的功耗。所提出的方案考虑了数据中的相关性,因此对实际应用更有效。系统架构以及编码和解码方案已经实现,以探索所提出的算法适用于任何CMP或NoC架构。本文用不同类型的实际数据流分析了所提出的编码方案。实验结果表明,该方案可使NoC链路的功耗降低27%。
{"title":"Data Correlation Aware Serial Encoding for Low Switching Power On-Chip Communication","authors":"Somrita Ghosh, P. Ghosal, Nabanita Das, S. Mohanty, Oghenekarho Okobiah","doi":"10.1109/ISVLSI.2014.48","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.48","url":null,"abstract":"Achieving lightning fast speed data communication in Chip Multi Processor (CMP) based systems as well as Networkon Chips (NoCs) is always desired for target performance. Data communication links inside the communication fabric of CMP or NoC architectures have strong impact on their performance and power dissipation. Several approaches exist to reduce power dissipation of parallel link on-chip interconnects, a very few techniques are reported for power reduction in serial links. The existing serial-link power reduction techniques don't necessarily account correlation exhibited in the data and hence are limited in terms of accuracy. In this paper, a novel data encoding scheme isproposed for serial links to decrease the number of self transitions to reduce the power in data transmission. The proposed scheme accounts the correlations in the data and hence is more effective for real-life applications. The system architecture as well as the encoding and decoding schemes have been implemented to explore the proposed algorithm applicable for any CMP or NoC architectures. The proposed encoding scheme has been analyzed with various types of real-life data streams. Experimental resultsshow that up to 27% reduction in power dissipation is possible in NoC links by the proposed scheme.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131777477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A New Walsh Hadamard Transform Architecture Using Current Mode Circuit 一种新的基于电流模式电路的Walsh Hadamard变换结构
Pub Date : 2014-07-09 DOI: 10.1109/ISVLSI.2014.71
S. Bhattacharya, S. Talapatra
With the reduction of supply voltage motivated bypower reduction, the signal to noise ratio of digital signals has reduced. Alternately, signal can be represented as current while the supply voltage still remaining small. This gives rise to the field of current mode signal processing circuits. In this work, we propose a current mode analog Walsh-Hadamard processor while the control mechanism remains digital. The design is implemented in 0.35μm CMOS technology. Walsh-Hadamard transform is a complete transform and finds significant applications in the field of image processing, filter design, multiplexing. To the best of our knowledge, no such implementation exists in the published literature.
随着电源电压的降低,数字信号的信噪比也随之降低。或者,信号可以表示为电流,而电源电压仍然保持小。这就产生了电流模式信号处理电路领域。在这项工作中,我们提出了一种电流模式模拟沃尔什-哈达玛德处理器,而控制机制仍然是数字的。该设计采用0.35μm CMOS技术实现。沃尔什-阿达玛变换是一种完备的变换,在图像处理、滤波器设计、多路复用等领域有着重要的应用。据我们所知,在已发表的文献中没有这样的实现。
{"title":"A New Walsh Hadamard Transform Architecture Using Current Mode Circuit","authors":"S. Bhattacharya, S. Talapatra","doi":"10.1109/ISVLSI.2014.71","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.71","url":null,"abstract":"With the reduction of supply voltage motivated bypower reduction, the signal to noise ratio of digital signals has reduced. Alternately, signal can be represented as current while the supply voltage still remaining small. This gives rise to the field of current mode signal processing circuits. In this work, we propose a current mode analog Walsh-Hadamard processor while the control mechanism remains digital. The design is implemented in 0.35μm CMOS technology. Walsh-Hadamard transform is a complete transform and finds significant applications in the field of image processing, filter design, multiplexing. To the best of our knowledge, no such implementation exists in the published literature.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125178315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Fast Hypergraph Bipartitioning Algorithm 一种快速超图双分区算法
Pub Date : 2014-07-09 DOI: 10.1109/ISVLSI.2014.58
Wenzan Cai, Evangeline F. Y. Young
In this paper, we focus on the hypergraph bipartitioning problem and present a new multilevel hypergraph partitioning algorithm that is much faster and of similar quality compared with hMETIS. In the coarsening phase, successive coarsened hypergraphs are constructed using the MFCC (Modified First-Choice Coarsening) algorithm. After getting a small hypergraph containing only a small number of vertices, we will use a randomized algorithm to obtain an initial partition and then apply an A-FM (Alternating Fiduccia-Mattheyses) refinement algorithm to optimize it. In the uncoarsening phase, we will extract clusters level by level and apply the A-FM repeatedly. Experiments on large benchmarks issued in the DAC 2012 Routability-Driven Placement Contest show that we can achieve similar or even better quality (1% improvement in minimum cut on average) and save 50% to 80% running time comparing with the state-of-the-art partitioner hMETIS.
本文针对超图的双分区问题,提出了一种新的多级超图分区算法,该算法与hMETIS相比,速度更快,质量相近。在粗化阶段,使用MFCC (Modified First-Choice粗化)算法构造连续粗化超图。在得到只包含少量顶点的小超图后,我们将使用随机化算法获得初始分区,然后应用a- fm(交替fiduccia - matthews)优化算法对其进行优化。在非粗化阶段,我们将逐级提取聚类,并重复应用A-FM。在DAC 2012 Routability-Driven Placement Contest中发布的大型基准测试上的实验表明,与最先进的分区器hMETIS相比,我们可以实现类似甚至更好的质量(平均最小切割提高1%),并节省50%到80%的运行时间。
{"title":"A Fast Hypergraph Bipartitioning Algorithm","authors":"Wenzan Cai, Evangeline F. Y. Young","doi":"10.1109/ISVLSI.2014.58","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.58","url":null,"abstract":"In this paper, we focus on the hypergraph bipartitioning problem and present a new multilevel hypergraph partitioning algorithm that is much faster and of similar quality compared with hMETIS. In the coarsening phase, successive coarsened hypergraphs are constructed using the MFCC (Modified First-Choice Coarsening) algorithm. After getting a small hypergraph containing only a small number of vertices, we will use a randomized algorithm to obtain an initial partition and then apply an A-FM (Alternating Fiduccia-Mattheyses) refinement algorithm to optimize it. In the uncoarsening phase, we will extract clusters level by level and apply the A-FM repeatedly. Experiments on large benchmarks issued in the DAC 2012 Routability-Driven Placement Contest show that we can achieve similar or even better quality (1% improvement in minimum cut on average) and save 50% to 80% running time comparing with the state-of-the-art partitioner hMETIS.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116201708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Trust No One: Thwarting "heartbleed" Attacks Using Privacy-Preserving Computation 不相信任何人:使用隐私保护计算阻止“心脏出血”攻击
Pub Date : 2014-07-09 DOI: 10.1109/ISVLSI.2014.86
N. G. Tsoutsos, M. Maniatakos
A security bug in the OpenSSL library, codenamed Heartbleed, allowed attackers to read the contents of the corresponding server's memory, effectively revealing passwords, master keys, and users' session cookies. As long as the server memory contents are in the clear, it is a matter of time until the next bug/attack hands information over to attackers. In this paper, we investigate the applicability of privacy-preserving general-purpose computation, that would potentially render any information leaked indecipherable to attackers. Privacy is ensured by the use of homomorphically-encrypted memory contents. To this end, we explore the boundaries of general-purpose computation constrained for user data privacy. Specifically, we explore the minimum amount of information required for general purpose computation, which typically requires control flow and branches, and to what extent such information can be kept private from threats that have theoretically unlimited resources, including access to the internals of a target system.
OpenSSL库中的一个安全漏洞(代号为Heartbleed)允许攻击者读取相应服务器内存的内容,从而有效地泄露密码、主密钥和用户的会话cookie。只要服务器内存内容是清晰的,下一次错误/攻击将信息交给攻击者只是时间问题。在本文中,我们研究了保护隐私的通用计算的适用性,这种计算可能会使任何泄露的信息无法被攻击者破译。通过使用同态加密的内存内容来确保隐私。为此,我们探索了用户数据隐私约束下通用计算的边界。具体来说,我们探讨了通用计算所需的最小信息量,这通常需要控制流和分支,以及这些信息在多大程度上可以不受理论上具有无限资源的威胁的影响,包括对目标系统内部的访问。
{"title":"Trust No One: Thwarting \"heartbleed\" Attacks Using Privacy-Preserving Computation","authors":"N. G. Tsoutsos, M. Maniatakos","doi":"10.1109/ISVLSI.2014.86","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.86","url":null,"abstract":"A security bug in the OpenSSL library, codenamed Heartbleed, allowed attackers to read the contents of the corresponding server's memory, effectively revealing passwords, master keys, and users' session cookies. As long as the server memory contents are in the clear, it is a matter of time until the next bug/attack hands information over to attackers. In this paper, we investigate the applicability of privacy-preserving general-purpose computation, that would potentially render any information leaked indecipherable to attackers. Privacy is ensured by the use of homomorphically-encrypted memory contents. To this end, we explore the boundaries of general-purpose computation constrained for user data privacy. Specifically, we explore the minimum amount of information required for general purpose computation, which typically requires control flow and branches, and to what extent such information can be kept private from threats that have theoretically unlimited resources, including access to the internals of a target system.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126704072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Theory, Synthesis, and Application of Adiabatic and Reversible Logic Circuits for Security Applications 绝热和可逆逻辑电路的理论、综合和应用
Pub Date : 2014-07-09 DOI: 10.1109/ISVLSI.2014.88
Matthew Morrison
Programmable reversible logic is emerging as a prospective logic design style for implementation in modern nanotechnology and quantum computing with minimal impact on circuit heat generation. Adiabatic logic is a design methodology for reversible logic in CMOS where the current flow through the circuit is controlled such that the energy dissipation due to switching and capacitor dissipation is minimized. Production of cost-effective Secure Integrated Chips, such as Smart Cards, requires hardware designers to consider tradeoffs in size, security, and power consumption. In order to design successful security-centric designs, the low-level hardware must contain built-in protection mechanisms to supplement cryptographic algorithms such as AES and Triple DES by preventing side channel attacks, such as Differential Power Analysis (DPA). Dynamic logic obfuscates the output waveforms and the circuit operation, reducing the effectiveness of the DPA attack. In this dissertation, I address theory, synthesis, and application of adiabatic and reversible logic circuits for security applications. First, we present a mathematical proof to demonstrate that reversible logic can be used to design sequential computing structures. Next, a novel algorithm for synthesis of adiabatic circuits in CMOS is presented. This approach is unique because it correlates the offsets in the permutation matrix to the transistors required for synthesis, instead of determining an equivalent circuit and substituting a previously synthesized circuit from a library. Using the ESPRESSO heuristic for minimization of Boolean functions method on each output node in parallel, we optimize the synthesized circuit. It is demonstrated that the algorithm produces a 32.86% improvement over previously synthesized circuit benchmarks. For stronger mitigation of DPA attacks, we propose the implementation of Adiabatic Dynamic Differential Logic for applications in secure IC design. A Performance Adiabatic Dynamic Differential Logic (PADDL) is presented for an implementation in high frequency secure ICs. This method improves the differential power over previous dynamic and differential logic methods by up to 89.65. Then, we present an adiabatic S-box which significantly reduces energy imbalance compared to previous benchmarks. The design is capable of forward encryption and reverse decryption with minimal overhead, allowing for efficient hardware reuse.
可编程可逆逻辑作为一种有前景的逻辑设计风格,在现代纳米技术和量子计算中实现,对电路发热的影响最小。绝热逻辑是CMOS中可逆逻辑的一种设计方法,通过控制电路中的电流,使开关和电容耗散引起的能量耗散最小化。生产具有成本效益的安全集成芯片(如智能卡)需要硬件设计人员考虑尺寸、安全性和功耗方面的权衡。为了设计成功的以安全为中心的设计,底层硬件必须包含内置的保护机制,通过防止侧信道攻击(如差分功率分析(DPA))来补充AES和Triple DES等加密算法。动态逻辑混淆了输出波形和电路操作,降低了DPA攻击的有效性。在这篇论文中,我讨论了绝热和可逆逻辑电路的理论、合成和应用。首先,我们提出了一个数学证明,证明可逆逻辑可以用于设计顺序计算结构。其次,提出了一种新的CMOS绝热电路合成算法。这种方法是独特的,因为它将排列矩阵中的偏移量与合成所需的晶体管相关联,而不是确定等效电路并替换先前从库中合成的电路。利用ESPRESSO启发式布尔函数最小化法在每个输出节点上并行优化合成电路。实验证明,该算法比先前合成的电路基准提高了32.86%。为了更强地缓解DPA攻击,我们提出在安全IC设计中应用绝热动态差分逻辑的实现。提出了一种在高频安全集成电路中实现的性能绝热动态差分逻辑(PADDL)。与之前的动态和微分逻辑方法相比,该方法将差分功率提高了89.65。然后,我们提出了一个绝热s盒,与以前的基准相比,它显着减少了能量不平衡。该设计能够以最小的开销进行正向加密和反向解密,从而实现高效的硬件重用。
{"title":"Theory, Synthesis, and Application of Adiabatic and Reversible Logic Circuits for Security Applications","authors":"Matthew Morrison","doi":"10.1109/ISVLSI.2014.88","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.88","url":null,"abstract":"Programmable reversible logic is emerging as a prospective logic design style for implementation in modern nanotechnology and quantum computing with minimal impact on circuit heat generation. Adiabatic logic is a design methodology for reversible logic in CMOS where the current flow through the circuit is controlled such that the energy dissipation due to switching and capacitor dissipation is minimized. Production of cost-effective Secure Integrated Chips, such as Smart Cards, requires hardware designers to consider tradeoffs in size, security, and power consumption. In order to design successful security-centric designs, the low-level hardware must contain built-in protection mechanisms to supplement cryptographic algorithms such as AES and Triple DES by preventing side channel attacks, such as Differential Power Analysis (DPA). Dynamic logic obfuscates the output waveforms and the circuit operation, reducing the effectiveness of the DPA attack. In this dissertation, I address theory, synthesis, and application of adiabatic and reversible logic circuits for security applications. First, we present a mathematical proof to demonstrate that reversible logic can be used to design sequential computing structures. Next, a novel algorithm for synthesis of adiabatic circuits in CMOS is presented. This approach is unique because it correlates the offsets in the permutation matrix to the transistors required for synthesis, instead of determining an equivalent circuit and substituting a previously synthesized circuit from a library. Using the ESPRESSO heuristic for minimization of Boolean functions method on each output node in parallel, we optimize the synthesized circuit. It is demonstrated that the algorithm produces a 32.86% improvement over previously synthesized circuit benchmarks. For stronger mitigation of DPA attacks, we propose the implementation of Adiabatic Dynamic Differential Logic for applications in secure IC design. A Performance Adiabatic Dynamic Differential Logic (PADDL) is presented for an implementation in high frequency secure ICs. This method improves the differential power over previous dynamic and differential logic methods by up to 89.65. Then, we present an adiabatic S-box which significantly reduces energy imbalance compared to previous benchmarks. The design is capable of forward encryption and reverse decryption with minimal overhead, allowing for efficient hardware reuse.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126187333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A CMOS Temperature Sensor with -0.34°C to 0.27°C Inaccuracy from -30°C to 80°C CMOS温度传感器-0.34°C至0.27°C,误差范围为-30°C至80°C
Pub Date : 2014-07-09 DOI: 10.1109/ISVLSI.2014.30
Hai Chi, Tom Chen
With growing applications and increased integration of functionalities on multi-electrode biosensors, more attentions are paid to the need to include on-chip temperature measurement for providing ambient temperature monitoring of bio-samples and for recording heat generated by biosensor chips and their potential damage to bio-samples. This paper presents an integrated temperature sensor design which is intended to provide ambient temperature monitoring in a highly integrated biosensor system. Special attentions were paid to improve power supply rejection (PSR) performance at the clock frequency of 1MHz in the integrated biosensor system using PSR enhanced OTAs. The temperature sensor design was implemented using a commercial 0.18μm CMOS process. The temperature sensor achieves an inaccuracy of -0.34°C to 0.27°C from -30°C to 80°C. At 36°C, the PSR is around -50dB at 1MHz and -89.5dB at DC.
随着多电极生物传感器的应用和功能集成的增加,越来越多的人关注芯片上的温度测量,以提供生物样品的环境温度监测,并记录生物传感器芯片产生的热量及其对生物样品的潜在损害。本文提出了一种集成温度传感器设计,旨在为高度集成的生物传感器系统提供环境温度监测。在时钟频率为1MHz的集成生物传感器系统中,利用PSR增强型ota提高了电源抑制(PSR)性能。该温度传感器设计采用商用0.18μm CMOS工艺实现。温度传感器在-30°C至80°C范围内的误差范围为-0.34°C至0.27°C。在36°C时,1MHz时PSR约为-50dB, DC时为-89.5dB。
{"title":"A CMOS Temperature Sensor with -0.34°C to 0.27°C Inaccuracy from -30°C to 80°C","authors":"Hai Chi, Tom Chen","doi":"10.1109/ISVLSI.2014.30","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.30","url":null,"abstract":"With growing applications and increased integration of functionalities on multi-electrode biosensors, more attentions are paid to the need to include on-chip temperature measurement for providing ambient temperature monitoring of bio-samples and for recording heat generated by biosensor chips and their potential damage to bio-samples. This paper presents an integrated temperature sensor design which is intended to provide ambient temperature monitoring in a highly integrated biosensor system. Special attentions were paid to improve power supply rejection (PSR) performance at the clock frequency of 1MHz in the integrated biosensor system using PSR enhanced OTAs. The temperature sensor design was implemented using a commercial 0.18μm CMOS process. The temperature sensor achieves an inaccuracy of -0.34°C to 0.27°C from -30°C to 80°C. At 36°C, the PSR is around -50dB at 1MHz and -89.5dB at DC.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128559406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Improved Thermal Model for Static Optimization of Application Mapping and Scheduling in Multiprocessor System-on-Chip 多处理器片上系统应用映射和调度静态优化的改进热模型
Pub Date : 2014-07-09 DOI: 10.1109/ISVLSI.2014.40
Juan Yi, Weichen Liu, Weiwen Jiang, Mingwen Qin, Lei Yang, Duo Liu, Chunming Xiao, Luelue Du, E. Sha
With the increasing power density and number of cores integrated into a single chip, thermal management is widely recognized as one of the essential issues in Multi-Processor Systems-on-Chip (MPSoCs). An uncontrolled temperature could significantly decrease system performance, lead to high cooling and packaging costs, and even cause serious damage. These issues have made temperature one of the major factors that must be addressed in MPSoC designs. Static scheduling of applications should take the thermal effects of task executions into consideration to keep the chip temperature under a safety threshold. However, inaccurate temperature estimation would cause processor overheating or system performance degradation. In this paper, we propose an improved thermal modeling technique that can be used to predict the chip temperature more accurately and efficiently at design time. We further develop a simulated annealing (SA)-based algorithm to address the static application mapping and scheduling problem based on the improved thermal model. The thermal condition is greatly improved and the total energy consumption is minimized. Experimental results show that the improved thermal modeling technique could provide an average of over 99% accuracy of temperature prediction when comparing with the results offered by Hotspot simulations. Based on it, the SA-based algorithm could reduce the chances that the temperature threshold to be violated at runtime by 24.3%.
随着功率密度的增加和单芯片核心数量的增加,热管理被广泛认为是多处理器片上系统(mpsoc)的基本问题之一。不受控制的温度可能会显著降低系统性能,导致高冷却和包装成本,甚至造成严重损坏。这些问题使得温度成为MPSoC设计中必须解决的主要因素之一。应用程序的静态调度应考虑任务执行的热效应,以使芯片温度保持在安全阈值以下。然而,不准确的温度估计会导致处理器过热或系统性能下降。在本文中,我们提出了一种改进的热建模技术,可用于在设计时更准确有效地预测芯片温度。我们进一步开发了一种基于模拟退火(SA)的算法来解决基于改进的热模型的静态应用映射和调度问题。热条件得到极大改善,总能耗降至最低。实验结果表明,与Hotspot模拟结果相比,改进的热模拟技术可以提供超过99%的平均温度预测精度。在此基础上,基于sa的算法可以将运行时温度阈值被突破的概率降低24.3%。
{"title":"An Improved Thermal Model for Static Optimization of Application Mapping and Scheduling in Multiprocessor System-on-Chip","authors":"Juan Yi, Weichen Liu, Weiwen Jiang, Mingwen Qin, Lei Yang, Duo Liu, Chunming Xiao, Luelue Du, E. Sha","doi":"10.1109/ISVLSI.2014.40","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.40","url":null,"abstract":"With the increasing power density and number of cores integrated into a single chip, thermal management is widely recognized as one of the essential issues in Multi-Processor Systems-on-Chip (MPSoCs). An uncontrolled temperature could significantly decrease system performance, lead to high cooling and packaging costs, and even cause serious damage. These issues have made temperature one of the major factors that must be addressed in MPSoC designs. Static scheduling of applications should take the thermal effects of task executions into consideration to keep the chip temperature under a safety threshold. However, inaccurate temperature estimation would cause processor overheating or system performance degradation. In this paper, we propose an improved thermal modeling technique that can be used to predict the chip temperature more accurately and efficiently at design time. We further develop a simulated annealing (SA)-based algorithm to address the static application mapping and scheduling problem based on the improved thermal model. The thermal condition is greatly improved and the total energy consumption is minimized. Experimental results show that the improved thermal modeling technique could provide an average of over 99% accuracy of temperature prediction when comparing with the results offered by Hotspot simulations. Based on it, the SA-based algorithm could reduce the chances that the temperature threshold to be violated at runtime by 24.3%.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128233846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Neuromemristive Extreme Learning Machines for Pattern Classification 模式分类的神经记忆极限学习机
Pub Date : 2014-07-09 DOI: 10.1109/ISVLSI.2014.67
Cory E. Merkel, D. Kudithipudi
This paper presents a novel neuromemristive architecture for pattern classification based on extreme learning machines (ELMs). Specifically, we propose CMOS current-mode neuron circuits, memristor-based bipolar synapse circuits, and a stochastic, hardware-friendly training approach based on the least-mean-squares (LMS) learning algorithm. These components are integrated into a current-mode ELM architecture. We show that the current-mode design is especially efficient for implementing constant network weights between the ELM's input and hidden layers. The neuromemristive ELM was simulated in the Cadence AMS design environment. We used an experimental memristor model based on experimental data from an HfO_{x} device. The top-level design was validated by training a 10 hidden-node network to detect edges in binary patterns. Results indicate that the proposed architecture and learning approach are able to yield 100% classification accuracy.
提出了一种基于极限学习机(ELMs)的模式分类神经记忆结构。具体来说,我们提出了CMOS电流模式神经元电路,基于记忆电阻器的双极突触电路,以及基于最小均方(LMS)学习算法的随机,硬件友好的训练方法。这些组件被集成到电流模式ELM架构中。我们表明,电流模式设计对于在ELM的输入层和隐藏层之间实现恒定的网络权重特别有效。在Cadence AMS设计环境中对神经记忆性ELM进行了仿真。我们使用了一个基于HfO_{x}器件实验数据的实验忆阻器模型。通过训练10个隐藏节点网络来检测二进制模式的边缘,验证了顶层设计。结果表明,所提出的体系结构和学习方法能够产生100%的分类准确率。
{"title":"Neuromemristive Extreme Learning Machines for Pattern Classification","authors":"Cory E. Merkel, D. Kudithipudi","doi":"10.1109/ISVLSI.2014.67","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.67","url":null,"abstract":"This paper presents a novel neuromemristive architecture for pattern classification based on extreme learning machines (ELMs). Specifically, we propose CMOS current-mode neuron circuits, memristor-based bipolar synapse circuits, and a stochastic, hardware-friendly training approach based on the least-mean-squares (LMS) learning algorithm. These components are integrated into a current-mode ELM architecture. We show that the current-mode design is especially efficient for implementing constant network weights between the ELM's input and hidden layers. The neuromemristive ELM was simulated in the Cadence AMS design environment. We used an experimental memristor model based on experimental data from an HfO_{x} device. The top-level design was validated by training a 10 hidden-node network to detect edges in binary patterns. Results indicate that the proposed architecture and learning approach are able to yield 100% classification accuracy.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":" 25","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113950113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Towards an Effective Utilization of Partially Defected Interconnections in 2D Mesh NoCs 二维网格noc中部分缺陷互连的有效利用
Pub Date : 2014-07-09 DOI: 10.1109/ISVLSI.2014.70
Changlin Chen, S. Cotofana
In typical NoC systems, most Routing Algorithms (RAs) abandon the interconnection between two adjacent routers if one traffic direction is broken, despite whether the other one is still functional or not. In this paper, we propose a distributed logic based RA, which can efficiently utilize the UnPaired Functional (UPF) links in such partially defected interconnects. The basic fault pattern tolerated by the proposed RA is a fault wall, which is composed of adjacent broken links with the same outgoing direction. Messages are routed around the fault walls along the misrouting contours of the broken links. The proposed RA requires at least 3 Virtual Channels (VCs) and dynamically reserve them to misrouted messages to avoid deadlock. Our experiments indicate that, for random and localized traffic patterns, we achieve an average saturation throughput 20% higher than the Solid Fault Region Tolerant (SFRT) RA, and 22% and 14% higher than the Ariadne routing table based RA, respectively. For the real applications, sample and satell, our proposal requires a routing execution time with at least 16% shorter than both SFRT and Ariadne. Synthesis results with Synopsis Design Compiler and TSMC 65nm technology indicate that, embedding the proposed RA into a baseline router results in 11% area overhead, which is only 3% higher than that of SFRT. In contrast, Ariadne area overhead is 15% for an 8 × 8 NoC and increases to 21% for a 10 × 10 NoC.
在典型的NoC系统中,如果一个流量方向被破坏,大多数路由算法(RAs)都会放弃两个相邻路由器之间的互连,而不管另一个是否仍然有效。在本文中,我们提出了一种基于分布式逻辑的RA,它可以有效地利用这种部分缺陷互连中的UnPaired Functional (UPF)链路。提出的RA容忍的基本故障模式是一个故障墙,它是由相邻的具有相同出方向的断裂链路组成的。消息沿着断开链路的错误路由轮廓围绕故障墙进行路由。建议的RA至少需要3个虚拟通道(VCs),并动态保留它们给路由错误的消息以避免死锁。我们的实验表明,对于随机和局部流量模式,我们实现的平均饱和吞吐量比固体容错区域(SFRT) RA高20%,比基于Ariadne路由表的RA高22%和14%。对于实际应用,样本和卫星,我们的建议要求路由执行时间至少比SFRT和Ariadne短16%。基于概要设计编译器和台积电65nm技术的综合结果表明,将所提出的RA嵌入基准路由器的面积开销为11%,仅比SFRT高3%。相比之下,对于8 × 8 NoC, Ariadne区域开销为15%,对于10 × 10 NoC则增加到21%。
{"title":"Towards an Effective Utilization of Partially Defected Interconnections in 2D Mesh NoCs","authors":"Changlin Chen, S. Cotofana","doi":"10.1109/ISVLSI.2014.70","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.70","url":null,"abstract":"In typical NoC systems, most Routing Algorithms (RAs) abandon the interconnection between two adjacent routers if one traffic direction is broken, despite whether the other one is still functional or not. In this paper, we propose a distributed logic based RA, which can efficiently utilize the UnPaired Functional (UPF) links in such partially defected interconnects. The basic fault pattern tolerated by the proposed RA is a fault wall, which is composed of adjacent broken links with the same outgoing direction. Messages are routed around the fault walls along the misrouting contours of the broken links. The proposed RA requires at least 3 Virtual Channels (VCs) and dynamically reserve them to misrouted messages to avoid deadlock. Our experiments indicate that, for random and localized traffic patterns, we achieve an average saturation throughput 20% higher than the Solid Fault Region Tolerant (SFRT) RA, and 22% and 14% higher than the Ariadne routing table based RA, respectively. For the real applications, sample and satell, our proposal requires a routing execution time with at least 16% shorter than both SFRT and Ariadne. Synthesis results with Synopsis Design Compiler and TSMC 65nm technology indicate that, embedding the proposed RA into a baseline router results in 11% area overhead, which is only 3% higher than that of SFRT. In contrast, Ariadne area overhead is 15% for an 8 × 8 NoC and increases to 21% for a 10 × 10 NoC.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130132380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Low Latency Scalable 3D NoC Using BFT Topology with Table Based Uniform Routing 基于表统一路由的BFT拓扑低延迟可扩展3D NoC
Pub Date : 2014-07-09 DOI: 10.1109/ISVLSI.2014.51
Avik Bose, P. Ghosal, S. Mohanty
Due to the limitations of traditional bus based systems, Network-on-Chip (NoC) has evolved as the most dominanttechnology in the paradigm of communication-centric revolution, where, besides the computation, inter-communication between the cores is an indispensable aspect of a SoC. Furthermore, the emergence of three dimensional integrated circuits (3D-ICs) has resulted in better performance, functionality, and packaging density compared to traditional planar ICs. The amalgamation of these two technologies, the 3D NoC architecture, can combine the benefits of these two new domains to offer an unprecedentedperformance gain. In this paper, we present a new 3D topological NoC design based on the butterfly fat tree (BFT) topology with an efficient table based uniform routing algorithm for 3D NoC. Extensive simulation experiments have been performed for BFT and compared to mesh, torus, butterfly and flattened butterfly topologies against four performance metrics viz. overall average latency, overall average acceptance rate, overall minimum acceptance rate, and average hop counts. There are significant latency improvements of 43-89 %, 83-88 %, 46-96 %, and 31-95 % over other topologies respectively. Average hop count is improved by 30 % and 13 % over mesh and torus. Also, there are improvements in average acceptance rate and minimum acceptance rate of 1-8 % and 5-14 % respectively for flattened butterfly and 6-9 % and 6-13 % over torus. Results evidently show that BFT is a very good choice for low network latency and faster communication.
由于传统基于总线的系统的局限性,片上网络(NoC)已经发展成为以通信为中心的革命范式中最主要的技术,其中,除了计算,内核之间的相互通信是SoC不可或缺的一个方面。此外,与传统平面集成电路相比,三维集成电路(3d - ic)的出现带来了更好的性能、功能和封装密度。这两种技术的融合,即3D NoC架构,可以结合这两个新领域的优势,提供前所未有的性能提升。本文提出了一种新的基于蝴蝶脂肪树(BFT)拓扑的三维拓扑NoC设计,并提出了一种高效的基于表的三维NoC统一路由算法。对BFT进行了大量的模拟实验,并将其与网格、环面、蝴蝶和扁平蝴蝶拓扑结构进行了对比,对比了四个性能指标,即总体平均延迟、总体平均接受率、总体最小接受率和平均跳数。与其他拓扑相比,延迟分别显著提高了43- 89%、83- 88%、46- 96%和31- 95%。平均跳数比网格和环面分别提高30%和13%。扁蝴蝶的平均合格率和最低合格率分别提高了1- 8%和5- 14%,环面蝴蝶的平均合格率和最低合格率分别提高了6- 9%和6- 13%。结果表明,BFT是低网络延迟和快速通信的理想选择。
{"title":"A Low Latency Scalable 3D NoC Using BFT Topology with Table Based Uniform Routing","authors":"Avik Bose, P. Ghosal, S. Mohanty","doi":"10.1109/ISVLSI.2014.51","DOIUrl":"https://doi.org/10.1109/ISVLSI.2014.51","url":null,"abstract":"Due to the limitations of traditional bus based systems, Network-on-Chip (NoC) has evolved as the most dominanttechnology in the paradigm of communication-centric revolution, where, besides the computation, inter-communication between the cores is an indispensable aspect of a SoC. Furthermore, the emergence of three dimensional integrated circuits (3D-ICs) has resulted in better performance, functionality, and packaging density compared to traditional planar ICs. The amalgamation of these two technologies, the 3D NoC architecture, can combine the benefits of these two new domains to offer an unprecedentedperformance gain. In this paper, we present a new 3D topological NoC design based on the butterfly fat tree (BFT) topology with an efficient table based uniform routing algorithm for 3D NoC. Extensive simulation experiments have been performed for BFT and compared to mesh, torus, butterfly and flattened butterfly topologies against four performance metrics viz. overall average latency, overall average acceptance rate, overall minimum acceptance rate, and average hop counts. There are significant latency improvements of 43-89 %, 83-88 %, 46-96 %, and 31-95 % over other topologies respectively. Average hop count is improved by 30 % and 13 % over mesh and torus. Also, there are improvements in average acceptance rate and minimum acceptance rate of 1-8 % and 5-14 % respectively for flattened butterfly and 6-9 % and 6-13 % over torus. Results evidently show that BFT is a very good choice for low network latency and faster communication.","PeriodicalId":405755,"journal":{"name":"2014 IEEE Computer Society Annual Symposium on VLSI","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133905678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
2014 IEEE Computer Society Annual Symposium on VLSI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1