首页 > 最新文献

2016 International Great Lakes Symposium on VLSI (GLSVLSI)最新文献

英文 中文
Approximate differential encoding for energy-efficient serial communication 近似差分编码节能串行通信
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2902974
D. J. Pagliari, E. Macii, M. Poncino
Embedded computing systems include several off-chip serial links, that are typically used to interface processing elements with peripherals, such as sensors, actuators and I/O controllers. Because of the long physical lines of these connections, they can contribute significantly to the total energy consumption. On the other hand, many embedded applications are error resilient, i.e. they can tolerate intermediate approximations without a significant impact on the final quality of results. This feature can be exploited in serial buses to explore the trade-off between data approximations and energy consumption. We propose a simple yet very effective approximate encoding for reducing dynamic energy in serial buses. Our approach uses differential encoding as a baseline scheme, and extends it with bounded approximations to overcome the intrinsic limitations of differential encoding for data with low temporal correlation. We show that encoder and decoder for this algorithm can be implemented in hardware with no throughput loss and truly marginal power overheads. Nonetheless, our approach is superior to state-of-the-art approximate encodings, and for realistic inputs it reaches up to 95% power reduction with <;1% average error on decoded data.
嵌入式计算系统包括几个片外串行链路,通常用于将处理元件与外设(如传感器、执行器和I/O控制器)连接起来。由于这些连接的物理线路很长,它们对总能耗的贡献很大。另一方面,许多嵌入式应用程序具有抗错误能力,即它们可以容忍中间近似,而不会对最终结果质量产生重大影响。可以在串行总线中利用此特性来探索数据近似和能耗之间的权衡。我们提出了一种简单而有效的近似编码来减少串行总线的动态能量。我们的方法使用差分编码作为基准方案,并使用有界近似对其进行扩展,以克服差分编码对低时间相关性数据的固有限制。我们证明了该算法的编码器和解码器可以在硬件上实现,没有吞吐量损失和真正的边际功率开销。尽管如此,我们的方法优于最先进的近似编码,并且对于实际输入,它可以降低高达95%的功率,解码数据的平均误差< 1%。
{"title":"Approximate differential encoding for energy-efficient serial communication","authors":"D. J. Pagliari, E. Macii, M. Poncino","doi":"10.1145/2902961.2902974","DOIUrl":"https://doi.org/10.1145/2902961.2902974","url":null,"abstract":"Embedded computing systems include several off-chip serial links, that are typically used to interface processing elements with peripherals, such as sensors, actuators and I/O controllers. Because of the long physical lines of these connections, they can contribute significantly to the total energy consumption. On the other hand, many embedded applications are error resilient, i.e. they can tolerate intermediate approximations without a significant impact on the final quality of results. This feature can be exploited in serial buses to explore the trade-off between data approximations and energy consumption. We propose a simple yet very effective approximate encoding for reducing dynamic energy in serial buses. Our approach uses differential encoding as a baseline scheme, and extends it with bounded approximations to overcome the intrinsic limitations of differential encoding for data with low temporal correlation. We show that encoder and decoder for this algorithm can be implemented in hardware with no throughput loss and truly marginal power overheads. Nonetheless, our approach is superior to state-of-the-art approximate encodings, and for realistic inputs it reaches up to 95% power reduction with <;1% average error on decoded data.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115162798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Exploratory power noise models of standard cell 14, 10, and 7 nm FinFET ICs 探索标准单元14nm、10nm和7nm FinFET ic的功率噪声模型
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903035
Ravi Patel, Kan Xu, E. Friedman, P. Raghavan
The physical dimensions of standard cells constrain the dimensions of power networks, affecting the on-chip power noise. An exploratory modeling methodology is presented for estimating power noise in advanced technology nodes. The models are evaluated for 14, 10, and 7 nm technologies to assess the impact on performance. Scaled technologies are shown to be more sensitive to power noise, resulting in potential loss of performance enhancements achieved by scaling. Stripes between local track rails is evaluated as a means to reduce power noise, exhibiting up to 56.5% improvement in power noise for the 7 nm technology node. A strong dependence on the width of a stripe is observed, indicating that fewer wide stripes are more favorable then many thin stripes. As a promising alternative material for power network interconnects, graphene is shown to exhibit good potential in reducing power noise. The effects of different scaling scenarios of local power rails on power noise are also discussed.
标准电池的物理尺寸限制了电网的尺寸,影响了片上功率噪声。提出了一种用于先进技术节点功率噪声估计的探索性建模方法。这些模型分别针对14nm、10nm和7nm技术进行了评估,以评估对性能的影响。缩放技术被证明对功率噪声更敏感,导致通过缩放实现的性能增强的潜在损失。局部轨道轨道之间的条纹被评估为降低功率噪声的一种手段,在7纳米技术节点上显示出高达56.5%的功率噪声改善。观察到条纹的宽度有很强的依赖性,表明较少的宽条纹比许多细条纹更有利。石墨烯作为一种很有前途的电网互连材料,在降低功率噪声方面显示出良好的潜力。讨论了局部电源轨的不同标度方案对功率噪声的影响。
{"title":"Exploratory power noise models of standard cell 14, 10, and 7 nm FinFET ICs","authors":"Ravi Patel, Kan Xu, E. Friedman, P. Raghavan","doi":"10.1145/2902961.2903035","DOIUrl":"https://doi.org/10.1145/2902961.2903035","url":null,"abstract":"The physical dimensions of standard cells constrain the dimensions of power networks, affecting the on-chip power noise. An exploratory modeling methodology is presented for estimating power noise in advanced technology nodes. The models are evaluated for 14, 10, and 7 nm technologies to assess the impact on performance. Scaled technologies are shown to be more sensitive to power noise, resulting in potential loss of performance enhancements achieved by scaling. Stripes between local track rails is evaluated as a means to reduce power noise, exhibiting up to 56.5% improvement in power noise for the 7 nm technology node. A strong dependence on the width of a stripe is observed, indicating that fewer wide stripes are more favorable then many thin stripes. As a promising alternative material for power network interconnects, graphene is shown to exhibit good potential in reducing power noise. The effects of different scaling scenarios of local power rails on power noise are also discussed.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126350450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asynchronous high speed serial links analysis using integrated charge for event detection 异步高速串行链路分析使用集成电荷事件检测
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2902998
Aditya Dalakoti, Carrie Segal, Merritt Miller, F. Brewer
We present a metric for event detection, targeted for the analysis of CMOS asynchronous serial data links. Our metric is used to analyze signaling strategies that allow for coincident or nearly coincident detection of both data and event timing. The metric predicts that the CMOS link signaling mechanism has substantial implicit dispersion and intersymbol interference [ISI] tolerance when compared to conventionally timed links. In fact, it predicts correct link operation in situations where eye-diagram techniques predict link failure. Practical operation margins and metrics are described and evaluated for PCB and cabling solutions suggesting 10+ Gb/s low-power asynchronous links could be implemented in CMOS 130nm technology.
我们提出了一个度量的事件检测,针对CMOS异步串行数据链路的分析。我们的度量用于分析信令策略,这些策略允许同时或几乎同时检测数据和事件定时。该指标预测,与传统定时链路相比,CMOS链路信号机制具有相当大的隐式色散和码间干扰容忍能力。事实上,在眼图技术预测链接故障的情况下,它预测正确的链接操作。对PCB和布线解决方案的实际操作边际和指标进行了描述和评估,表明可以在CMOS 130nm技术中实现10+ Gb/s低功耗异步链路。
{"title":"Asynchronous high speed serial links analysis using integrated charge for event detection","authors":"Aditya Dalakoti, Carrie Segal, Merritt Miller, F. Brewer","doi":"10.1145/2902961.2902998","DOIUrl":"https://doi.org/10.1145/2902961.2902998","url":null,"abstract":"We present a metric for event detection, targeted for the analysis of CMOS asynchronous serial data links. Our metric is used to analyze signaling strategies that allow for coincident or nearly coincident detection of both data and event timing. The metric predicts that the CMOS link signaling mechanism has substantial implicit dispersion and intersymbol interference [ISI] tolerance when compared to conventionally timed links. In fact, it predicts correct link operation in situations where eye-diagram techniques predict link failure. Practical operation margins and metrics are described and evaluated for PCB and cabling solutions suggesting 10+ Gb/s low-power asynchronous links could be implemented in CMOS 130nm technology.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128467284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Capturing true workload dependency of BTI-induced degradation in CPU components 捕获CPU组件中由bti引起的降级的真实工作负载依赖性
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2902992
Dimitrios Stamoulis, S. Corbetta, D. Rodopoulos, P. Weckx, P. Debacker, B. Meyer, B. Kaczer, P. Raghavan, D. Soudris, F. Catthoor, Z. Zilic
Atomistic-based approaches accurately model Bias Temperature Instability phenomena, but they suffer from prolonged execution times, preventing their seamless integration in system-level analysis flows. In this paper we present a comprehensive flow that combines the accuracy of Capture Emission Time (CET) maps with the efficiency of the Compact Digital Waveform (CDW) representation. That way, we capture the true workload-dependent BTI-induced degradation of selected CPU components. First, we show that existing works that assume constant stress patterns fail to account for workload dependency leading to fundamental estimation errors. Second, we evaluate the impact of different real workloads on selected CPU sub-blocks from a commercial processor design. To the best of our knowledge, this is the first work that combines atomistic property and true workload-dependency for variability analysis.
基于原子的方法准确地模拟了偏置温度不稳定性现象,但它们的执行时间较长,阻碍了它们在系统级分析流中的无缝集成。在本文中,我们提出了一个综合流程,将捕获发射时间(CET)地图的准确性与紧凑数字波形(CDW)表示的效率相结合。通过这种方式,我们捕获了与工作负载相关的bti引起的选定CPU组件的降级。首先,我们表明,假设恒定应力模式的现有工作无法解释导致基本估计错误的工作负载依赖性。其次,我们评估了不同实际工作负载对来自商业处理器设计的选定CPU子块的影响。据我们所知,这是第一个将原子属性和真正的工作负载依赖性结合起来进行可变性分析的工作。
{"title":"Capturing true workload dependency of BTI-induced degradation in CPU components","authors":"Dimitrios Stamoulis, S. Corbetta, D. Rodopoulos, P. Weckx, P. Debacker, B. Meyer, B. Kaczer, P. Raghavan, D. Soudris, F. Catthoor, Z. Zilic","doi":"10.1145/2902961.2902992","DOIUrl":"https://doi.org/10.1145/2902961.2902992","url":null,"abstract":"Atomistic-based approaches accurately model Bias Temperature Instability phenomena, but they suffer from prolonged execution times, preventing their seamless integration in system-level analysis flows. In this paper we present a comprehensive flow that combines the accuracy of Capture Emission Time (CET) maps with the efficiency of the Compact Digital Waveform (CDW) representation. That way, we capture the true workload-dependent BTI-induced degradation of selected CPU components. First, we show that existing works that assume constant stress patterns fail to account for workload dependency leading to fundamental estimation errors. Second, we evaluate the impact of different real workloads on selected CPU sub-blocks from a commercial processor design. To the best of our knowledge, this is the first work that combines atomistic property and true workload-dependency for variability analysis.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116560646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Survey of emerging technology based physical unclonable funtions 基于物理不可克隆功能的新兴技术综述
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903044
Ilia A. Bautista Adames, J. Das, S. Bhanja
Authentication of electronic devices has become critical. Hardware authentication is one way to enhance security of a chip. Along with software, it makes it harder for an intruder to access any computer, smart-phone, or other devices without authorization. One way of authenticating a device through hardware is to use the fabrication anomalies, which are random and unclonable. This mechanism is called a Physical Unclonable Function (PUF). PUFs are easy to evaluate but hard to predict. PUF is a concept that gained popularity since the past decade, when researchers started taking advantage of the randomness of electrical signals in order to build a unique authentication block. This survey will show the state-of-the-art devices that are currently investigated as PUFs. The different technologies are compared by taking into account reproducibility, uniqueness, randomness, area, scalability, and compatibility with CMOS. Emphasis is put on technologies that are emerging and gaining commercial interest. Through comparisons, we will show their applicability to different environments.
电子设备的认证变得至关重要。硬件认证是提高芯片安全性的一种方法。与软件一起,它使入侵者在未经授权的情况下更难访问任何计算机、智能手机或其他设备。通过硬件验证设备的一种方法是使用制造异常,这是随机的和不可克隆的。这种机制被称为物理不可克隆功能(PUF)。puf很容易评估,但很难预测。PUF是一个从过去十年开始流行起来的概念,当时研究人员开始利用电信号的随机性来建立一个独特的身份验证块。这项调查将展示目前作为puf研究的最先进的设备。通过考虑再现性、唯一性、随机性、面积、可扩展性和与CMOS的兼容性,对不同的技术进行了比较。重点放在新兴和获得商业利益的技术上。通过比较,我们将展示它们对不同环境的适用性。
{"title":"Survey of emerging technology based physical unclonable funtions","authors":"Ilia A. Bautista Adames, J. Das, S. Bhanja","doi":"10.1145/2902961.2903044","DOIUrl":"https://doi.org/10.1145/2902961.2903044","url":null,"abstract":"Authentication of electronic devices has become critical. Hardware authentication is one way to enhance security of a chip. Along with software, it makes it harder for an intruder to access any computer, smart-phone, or other devices without authorization. One way of authenticating a device through hardware is to use the fabrication anomalies, which are random and unclonable. This mechanism is called a Physical Unclonable Function (PUF). PUFs are easy to evaluate but hard to predict. PUF is a concept that gained popularity since the past decade, when researchers started taking advantage of the randomness of electrical signals in order to build a unique authentication block. This survey will show the state-of-the-art devices that are currently investigated as PUFs. The different technologies are compared by taking into account reproducibility, uniqueness, randomness, area, scalability, and compatibility with CMOS. Emphasis is put on technologies that are emerging and gaining commercial interest. Through comparisons, we will show their applicability to different environments.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124345174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Secure and low-overhead circuit obfuscation technique with multiplexers 使用多路复用器的安全、低开销电路混淆技术
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903000
Xueyan Wang, Xiaotao Jia, Qiang Zhou, Yici Cai, Jianlei Yang, Mingze Gao, G. Qu
Circuit obfuscation techniques have been proposed to conceal circuit's functionality in order to thwart reverse engineering (RE) attacks to integrated circuits (IC). We believe that a good obfuscation method should have low design complexity and low performance overhead, yet, causing high RE attack complexity. However, existing obfuscation techniques do not meet all these requirements. In this paper, we propose a polynomial obfuscation scheme which leverages special designed multiplexers (MUXs) to replace judiciously selected logic gates. Candidate to-be-obfuscated logic gates are selected based on a novel gate classification method which utilizes IC topological structure information. We show that this scheme is resilient to all the known attacks, hence it is secure. Experiments are conducted on ISCAS 85/89 and MCNC benchmark suites to evaluate the performance overhead due to obfuscation.
电路混淆技术被提出用来隐藏电路的功能,以阻止对集成电路(IC)的逆向工程攻击。我们认为,一个好的混淆方法应该具有较低的设计复杂度和较低的性能开销,但会导致较高的正则攻击复杂度。然而,现有的混淆技术并不能满足所有这些需求。在本文中,我们提出了一种多项式混淆方案,该方案利用特殊设计的多路复用器(mux)来取代明智选择的逻辑门。基于一种利用集成电路拓扑结构信息的栅极分类方法,选择了待混淆的候选逻辑门。我们证明了该方案对所有已知的攻击具有弹性,因此它是安全的。在ISCAS 85/89和MCNC基准测试套件上进行了实验,以评估由于混淆造成的性能开销。
{"title":"Secure and low-overhead circuit obfuscation technique with multiplexers","authors":"Xueyan Wang, Xiaotao Jia, Qiang Zhou, Yici Cai, Jianlei Yang, Mingze Gao, G. Qu","doi":"10.1145/2902961.2903000","DOIUrl":"https://doi.org/10.1145/2902961.2903000","url":null,"abstract":"Circuit obfuscation techniques have been proposed to conceal circuit's functionality in order to thwart reverse engineering (RE) attacks to integrated circuits (IC). We believe that a good obfuscation method should have low design complexity and low performance overhead, yet, causing high RE attack complexity. However, existing obfuscation techniques do not meet all these requirements. In this paper, we propose a polynomial obfuscation scheme which leverages special designed multiplexers (MUXs) to replace judiciously selected logic gates. Candidate to-be-obfuscated logic gates are selected based on a novel gate classification method which utilizes IC topological structure information. We show that this scheme is resilient to all the known attacks, hence it is secure. Experiments are conducted on ISCAS 85/89 and MCNC benchmark suites to evaluate the performance overhead due to obfuscation.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131762941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
A sampling clock skew correction technique for time-interleaved SAR ADCs 时间交错SAR adc的采样时钟偏差校正技术
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903008
D. Prashanth, Hae-Seung Lee
A technique for sampling clock skew correction by adjusting the delay in the input signal to each channel in a time-interleaved (TI) ADC is proposed. A proof-of-concept TI ADC employing this technique was implemented in a 65 nm CMOS process. The four-way TI ADC operates at an effective sampling rate of 150 MS/s, and achieves 60.2 dB and 58.2 dB SNDR for an input signal frequency of 2.1 MHz and 74.1 MHz, respectively. The ADC consumes 12.4 mW from a 1.2 V supply and occupies an area of 0.9 mm2.
提出了一种通过调整时间交错(TI) ADC中各通道输入信号的延迟来校正采样时钟偏差的技术。采用该技术的概念验证型TI ADC已在65纳米CMOS工艺中实现。该四路TI ADC的有效采样率为150 MS/s,在输入信号频率分别为2.1 MHz和74.1 MHz时,SNDR分别为60.2 dB和58.2 dB。ADC的功耗为12.4 mW,电源电压为1.2 V,面积为0.9 mm2。
{"title":"A sampling clock skew correction technique for time-interleaved SAR ADCs","authors":"D. Prashanth, Hae-Seung Lee","doi":"10.1145/2902961.2903008","DOIUrl":"https://doi.org/10.1145/2902961.2903008","url":null,"abstract":"A technique for sampling clock skew correction by adjusting the delay in the input signal to each channel in a time-interleaved (TI) ADC is proposed. A proof-of-concept TI ADC employing this technique was implemented in a 65 nm CMOS process. The four-way TI ADC operates at an effective sampling rate of 150 MS/s, and achieves 60.2 dB and 58.2 dB SNDR for an input signal frequency of 2.1 MHz and 74.1 MHz, respectively. The ADC consumes 12.4 mW from a 1.2 V supply and occupies an area of 0.9 mm2.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"33 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131992150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Task-resource co-allocation for hotspot minimization in heterogeneous many-core NoCs 异构多核noc中热点最小化的任务资源协同分配
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903003
Md Farhadur Reza, Dan Zhao, Hongyi Wu
To fully exploit the massive parallelism of many cores, this work tackles the problem of mapping large-scale applications onto heterogeneous on-chip networks (NoCs) to minimize the peak workload for energy hotspot avoidance. A task-resource co-optimization framework is proposed which configures the on-chip communication infrastructure and maps the applications simultaneously and coherently, aiming to minimize the peak load under the constraints of computation power and communication capacity and a total cost budget of on-chip resources. The problem is first formulated into a linear programming model to search for optimal solution. A heuristic algorithm is further developed for fast design space exploration in extremely large-scale many-core NoCs. Extensive simulations are carried out under real-world benchmarks and randomly generated task graphs to demonstrate the effectiveness and efficiency of the proposed schemes.
为了充分利用多核的大规模并行性,本工作解决了将大规模应用映射到异构片上网络(noc)的问题,以最大限度地减少能量热点规避的峰值工作负载。提出了一种任务-资源协同优化框架,在计算能力、通信容量和片上资源总成本预算的约束下,以最小化峰值负载为目标,对片上通信基础设施进行同步、相干地配置和应用映射。首先将问题化为线性规划模型来寻找最优解。进一步提出了一种启发式算法,用于超大规模多核noc的快速设计空间探索。在真实世界的基准和随机生成的任务图下进行了大量的模拟,以证明所提出方案的有效性和效率。
{"title":"Task-resource co-allocation for hotspot minimization in heterogeneous many-core NoCs","authors":"Md Farhadur Reza, Dan Zhao, Hongyi Wu","doi":"10.1145/2902961.2903003","DOIUrl":"https://doi.org/10.1145/2902961.2903003","url":null,"abstract":"To fully exploit the massive parallelism of many cores, this work tackles the problem of mapping large-scale applications onto heterogeneous on-chip networks (NoCs) to minimize the peak workload for energy hotspot avoidance. A task-resource co-optimization framework is proposed which configures the on-chip communication infrastructure and maps the applications simultaneously and coherently, aiming to minimize the peak load under the constraints of computation power and communication capacity and a total cost budget of on-chip resources. The problem is first formulated into a linear programming model to search for optimal solution. A heuristic algorithm is further developed for fast design space exploration in extremely large-scale many-core NoCs. Extensive simulations are carried out under real-world benchmarks and randomly generated task graphs to demonstrate the effectiveness and efficiency of the proposed schemes.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114271766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A design of a non-volatile PMC-based (programmable metallization cell) register file 基于非易失性pmc(可编程金属化单元)的寄存器文件的设计
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2903034
Salin Junsangsri, Jie Han, F. Lombardi
This paper presents the design of a non-volatile register file using cells made of a SRAM and a Programmable Metallization Cell (PMC). The proposed cell is a symmetric 8T2P (8-transistors, 2PMC) design; it utilizes three control lines to ensure the correctness in its operations (i.e. Write, Read, Store and Restore). Simulation results using HSPICE are provided for the cell as well as the register file array (both one- and two-dimensional schemes). At cell level, it is shown that the off-state resistance has a limited effect on the Read time, because in the proposed circuit the transistor connecting the PMCs to the SRAM is off. While having no significant effect on the Store time, the time of the Restore operation depends on the value of the off-state resistance, i.e. an increase in off-state PMC resistance causes an increase in Restore time. Comparison between non-volatile register files utilizing either PMCs, or Phase Change Memories (PCMs) is provided.The register file using PMCs has a faster Store and Read times than the PCM-based counterpart; this is mostly caused by the difference in resistance values for these two non-volatile technologies. The lower delay involved in these operations confirms that the proposed PMC-based register file offers significant advantages in terms of delay performance.
本文介绍了用SRAM和可编程金属化单元(PMC)组成的单元设计一个非易失性寄存器文件。提出的电池是对称的8T2P(8个晶体管,2PMC)设计;它使用三条控制线来确保其操作的正确性(即写,读,存储和恢复)。使用HSPICE对单元和寄存器文件阵列(包括一维和二维方案)进行了仿真结果。在单元水平上,显示出断开状态电阻对读取时间的影响有限,因为在建议的电路中,将pmc连接到SRAM的晶体管是关闭的。虽然对Store时间没有显著影响,但Restore操作的时间取决于off-state阻值,即off-state PMC阻值的增加会导致Restore时间的增加。提供了使用pmc或相变存储器(pcm)的非易失性寄存器文件之间的比较。使用pmc的寄存器文件比基于pcm的寄存器文件具有更快的存储和读取时间;这主要是由于这两种非易失性技术的电阻值不同造成的。这些操作中涉及的较低延迟证实了所建议的基于pmc的寄存器文件在延迟性能方面提供了显着的优势。
{"title":"A design of a non-volatile PMC-based (programmable metallization cell) register file","authors":"Salin Junsangsri, Jie Han, F. Lombardi","doi":"10.1145/2902961.2903034","DOIUrl":"https://doi.org/10.1145/2902961.2903034","url":null,"abstract":"This paper presents the design of a non-volatile register file using cells made of a SRAM and a Programmable Metallization Cell (PMC). The proposed cell is a symmetric 8T2P (8-transistors, 2PMC) design; it utilizes three control lines to ensure the correctness in its operations (i.e. Write, Read, Store and Restore). Simulation results using HSPICE are provided for the cell as well as the register file array (both one- and two-dimensional schemes). At cell level, it is shown that the off-state resistance has a limited effect on the Read time, because in the proposed circuit the transistor connecting the PMCs to the SRAM is off. While having no significant effect on the Store time, the time of the Restore operation depends on the value of the off-state resistance, i.e. an increase in off-state PMC resistance causes an increase in Restore time. Comparison between non-volatile register files utilizing either PMCs, or Phase Change Memories (PCMs) is provided.The register file using PMCs has a faster Store and Read times than the PCM-based counterpart; this is mostly caused by the difference in resistance values for these two non-volatile technologies. The lower delay involved in these operations confirms that the proposed PMC-based register file offers significant advantages in terms of delay performance.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114686259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Performance constraint-aware task mapping to optimize lifetime reliability of manycore systems 性能约束感知任务映射优化多核系统寿命可靠性
Pub Date : 2016-05-18 DOI: 10.1145/2902961.2902996
Vijeta Rathore, Vivek Chaturvedi, T. Srikanthan
Negative bias temperature instability (NBTI) has emerged as a critical challenge to lifetime reliability of computing systems. Traditionally, temperature-aware methodologies are used to mitigate the impact of NBTI on aging and degradation of computing systems. However, in the presence of process variation, which is the norm in manycore processors, temperature-aware techniques are inefficient in improving lifetime reliability and can result in poor performance. In this paper, we propose a novel performance constraint-aware task mapping technique to improve lifetime reliability by mitigating NBTI considering on-chip process variation. Our approach consists of two phases, namely design-time and run-time. During design time, we generate Pareto-optimal mappings. Following which, our run-time technique judiciously intervenes to perform workload migration to save the weakest processing core. We compare our approach with performance-greedy and thermal-aware task mapping techniques. The experiment results demonstrate that our approach outperforms other two techniques and improves lifetime reliability of a manycore system as much as 54% without violating the throughput constraint.
负偏置温度不稳定性(NBTI)已经成为计算系统寿命可靠性的一个关键挑战。传统上,温度感知方法用于减轻NBTI对计算系统老化和退化的影响。然而,在存在进程变化的情况下(这在多核处理器中是常态),温度感知技术在提高寿命可靠性方面效率低下,并可能导致性能下降。在本文中,我们提出了一种新的性能约束感知任务映射技术,通过考虑片上工艺变化来减轻NBTI,从而提高寿命可靠性。我们的方法包括两个阶段,即设计时和运行时。在设计期间,我们生成帕累托最优映射。接下来,我们的运行时技术会明智地进行干预,执行工作负载迁移,以保存最弱的处理核心。我们将我们的方法与性能贪婪和热感知任务映射技术进行比较。实验结果表明,我们的方法优于其他两种技术,在不违反吞吐量约束的情况下,将多核系统的寿命可靠性提高了54%。
{"title":"Performance constraint-aware task mapping to optimize lifetime reliability of manycore systems","authors":"Vijeta Rathore, Vivek Chaturvedi, T. Srikanthan","doi":"10.1145/2902961.2902996","DOIUrl":"https://doi.org/10.1145/2902961.2902996","url":null,"abstract":"Negative bias temperature instability (NBTI) has emerged as a critical challenge to lifetime reliability of computing systems. Traditionally, temperature-aware methodologies are used to mitigate the impact of NBTI on aging and degradation of computing systems. However, in the presence of process variation, which is the norm in manycore processors, temperature-aware techniques are inefficient in improving lifetime reliability and can result in poor performance. In this paper, we propose a novel performance constraint-aware task mapping technique to improve lifetime reliability by mitigating NBTI considering on-chip process variation. Our approach consists of two phases, namely design-time and run-time. During design time, we generate Pareto-optimal mappings. Following which, our run-time technique judiciously intervenes to perform workload migration to save the weakest processing core. We compare our approach with performance-greedy and thermal-aware task mapping techniques. The experiment results demonstrate that our approach outperforms other two techniques and improves lifetime reliability of a manycore system as much as 54% without violating the throughput constraint.","PeriodicalId":407054,"journal":{"name":"2016 International Great Lakes Symposium on VLSI (GLSVLSI)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121359523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
2016 International Great Lakes Symposium on VLSI (GLSVLSI)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1