首页 > 最新文献

2011 IEEE 29th International Conference on Computer Design (ICCD)最新文献

英文 中文
TAP prediction: Reusing conditional branch predictor for indirect branches with Target Address Pointers TAP预测:为带有目标地址指针的间接分支重用条件分支预测器
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081386
Zichao Xie, Dong Tong, Mingkai Huang, Xiaoyin Wang, Qinqing Shi, Xu Cheng
Indirect-branch prediction is becoming more important for modern processors as more programs are written in object-oriented languages. Previous hardware-based indirect-branch predictors generally require significant hardware storage or use aggressive algorithms which make the processor front-end more complex. In this paper, we propose a fast and cost-efficient indirect-branch prediction strategy, called Target Address Pointer (TAP) Prediction. TAP Prediction reuses the history-based branch direction predictor to detect occurrences of indirect branches, and then stores indirect-branch targets in the Branch Target Buffer (BTB). The key idea of TAP Prediction is to predict the Target Address Pointers, which generate virtual addresses to index the targets stored in the BTB, rather than to predict the indirect-branch targets directly. TAP Prediction also reuses the branch direction predictor to construct several small predictors. When fetching an indirect branch, these small predictors work in parallel to generate the target address pointer. Then TAP prediction accesses the BTB to fetch the predicted indirect-branch target using the generated virtual address. This mechanism could achieve time cost comparable to that of dedicated-storage-predictors, without requiring additional large amounts of storage. Our evaluation shows that for three representative direction predictors-Hybrid, Perceptrons, and O-GEHL-TAP schemes improve performance by 18.19%, 21.52%, and 20.59%, respectively, over the baseline processor with the most commonly-used BTB prediction. Compared with previous hardware-based indirect-branch predictors, the TAP-Perceptrons scheme achieves performance improvement equivalent to that provided by a 48KB TTC predictor, and it also outperforms the VPC predictor by 14.02%.
随着越来越多的程序用面向对象语言编写,间接分支预测对现代处理器变得越来越重要。以前基于硬件的间接分支预测器通常需要大量的硬件存储或使用激进的算法,这使得处理器前端更加复杂。在本文中,我们提出了一种快速和经济的间接分支预测策略,称为目标地址指针(TAP)预测。TAP预测重用基于历史的分支方向预测器来检测间接分支的发生,然后将间接分支目标存储在分支目标缓冲区(BTB)中。TAP预测的关键思想是预测目标地址指针,它生成虚拟地址来索引存储在BTB中的目标,而不是直接预测间接分支目标。TAP预测还重用分支方向预测器来构建几个小的预测器。在获取间接分支时,这些小的预测器并行工作以生成目标地址指针。然后TAP预测访问BTB,使用生成的虚拟地址获取预测的间接分支目标。这种机制可以实现与专用存储预测器相当的时间成本,而不需要额外的大量存储。我们的评估表明,对于三种代表性的方向预测器- hybrid, Perceptrons和O-GEHL-TAP方案,与最常用的BTB预测基线处理器相比,性能分别提高了18.19%,21.52%和20.59%。与以前基于硬件的间接分支预测器相比,TAP-Perceptrons方案实现的性能改进相当于48KB TTC预测器提供的性能改进,并且比VPC预测器高出14.02%。
{"title":"TAP prediction: Reusing conditional branch predictor for indirect branches with Target Address Pointers","authors":"Zichao Xie, Dong Tong, Mingkai Huang, Xiaoyin Wang, Qinqing Shi, Xu Cheng","doi":"10.1109/ICCD.2011.6081386","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081386","url":null,"abstract":"Indirect-branch prediction is becoming more important for modern processors as more programs are written in object-oriented languages. Previous hardware-based indirect-branch predictors generally require significant hardware storage or use aggressive algorithms which make the processor front-end more complex. In this paper, we propose a fast and cost-efficient indirect-branch prediction strategy, called Target Address Pointer (TAP) Prediction. TAP Prediction reuses the history-based branch direction predictor to detect occurrences of indirect branches, and then stores indirect-branch targets in the Branch Target Buffer (BTB). The key idea of TAP Prediction is to predict the Target Address Pointers, which generate virtual addresses to index the targets stored in the BTB, rather than to predict the indirect-branch targets directly. TAP Prediction also reuses the branch direction predictor to construct several small predictors. When fetching an indirect branch, these small predictors work in parallel to generate the target address pointer. Then TAP prediction accesses the BTB to fetch the predicted indirect-branch target using the generated virtual address. This mechanism could achieve time cost comparable to that of dedicated-storage-predictors, without requiring additional large amounts of storage. Our evaluation shows that for three representative direction predictors-Hybrid, Perceptrons, and O-GEHL-TAP schemes improve performance by 18.19%, 21.52%, and 20.59%, respectively, over the baseline processor with the most commonly-used BTB prediction. Compared with previous hardware-based indirect-branch predictors, the TAP-Perceptrons scheme achieves performance improvement equivalent to that provided by a 48KB TTC predictor, and it also outperforms the VPC predictor by 14.02%.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133017877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Improving GPU Robustness by making use of faulty parts 利用故障部件提高GPU鲁棒性
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081422
Artem Durytskyy, M. Zahran, R. Karri
With hundreds of processing units in current state-of-the-art graphics processing units (GPUs), the probability that one or more processing units fail due to permanent faults, during fabrication or post deployment, increases drastically. In our experiments we found that the loss of a single streaming multiprocessor (SM) in an 8-SM GPU resulted in as much as 16%performance loss. The default method for dealing with faulty SMs is to turn them off. Although faulty SMs cannot be trusted to completely execute a single kernel (program assigned to an SM) correctly, we show that we can still make use of these SMs to improve system throughput by generating and supplying high-level hints to other functional SMs. By making the faulty SMs supply hints to functional SMs, we have been able to achieve an average speed-up of about 16 % over the baseline case (wherein the faulty SMs are turned off). The proposed technique requires minimal hardware overhead and is highly scalable.
在当前最先进的图形处理单元(gpu)中有数百个处理单元,在制造或部署期间,一个或多个处理单元由于永久性故障而失效的可能性急剧增加。在我们的实验中,我们发现在8-SM GPU中丢失单个流多处理器(SM)会导致多达16%的性能损失。处理故障短信的默认方法是关闭短信。尽管不能信任有故障的SMs完全正确地执行单个内核(分配给SM的程序),但我们表明,我们仍然可以通过生成和向其他功能性SMs提供高级提示来利用这些SMs来提高系统吞吐量。通过使故障短信向功能短信提供提示,我们已经能够实现比基线情况(其中故障短信被关闭)平均提高约16%的速度。所建议的技术需要最小的硬件开销,并且具有高度可伸缩性。
{"title":"Improving GPU Robustness by making use of faulty parts","authors":"Artem Durytskyy, M. Zahran, R. Karri","doi":"10.1109/ICCD.2011.6081422","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081422","url":null,"abstract":"With hundreds of processing units in current state-of-the-art graphics processing units (GPUs), the probability that one or more processing units fail due to permanent faults, during fabrication or post deployment, increases drastically. In our experiments we found that the loss of a single streaming multiprocessor (SM) in an 8-SM GPU resulted in as much as 16%performance loss. The default method for dealing with faulty SMs is to turn them off. Although faulty SMs cannot be trusted to completely execute a single kernel (program assigned to an SM) correctly, we show that we can still make use of these SMs to improve system throughput by generating and supplying high-level hints to other functional SMs. By making the faulty SMs supply hints to functional SMs, we have been able to achieve an average speed-up of about 16 % over the baseline case (wherein the faulty SMs are turned off). The proposed technique requires minimal hardware overhead and is highly scalable.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115901347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Designing 3D test wrappers for pre-bond and post-bond test of 3D embedded cores 设计三维内嵌岩芯粘接前、粘接后的三维测试封装器
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081381
D. L. Lewis, Shreepad Panth, Xin Zhao, S. Lim, H. Lee
3D integration is a promising new technology for tightly integrating multiple active silicon layers into a single chip stack. Both the integration of heterogeneous tiers and the partitioning of functional units across tiers leads to significant improvements in functionality, area, performance, and power consumption. Managing the complexity of 3D design is a significant challenge that will require a system-on-chip approach, but the application of SOC design to 3D necessitates extensions to current test methodology. In this paper, we propose extending test wrappers, a popular SOC DFT technique, into the third dimension. We develop an algorithm employing the Best Fit Decreasing and Kernighan-Lin Partitioning heuristics to produce 3D wrappers that minimize test time, maximize reuse of routing resources across test modes, and allow for different TAM bus widths in different test modes. On average the two variants of our algorithm reuse 93% and 92% of the test wrapper wires while delivering test times of just 0.06% and 0.32% above the minimum.
3D集成是一种很有前途的新技术,可以将多个有源硅层紧密集成到单个芯片堆栈中。异构层的集成和跨层的功能单元划分都可以显著改善功能、面积、性能和功耗。管理3D设计的复杂性是一项重大挑战,需要采用片上系统方法,但将SOC设计应用于3D需要扩展当前的测试方法。在本文中,我们提出扩展测试包装,一个流行的SOC DFT技术,到第三维。我们开发了一种采用最佳拟合递减法和Kernighan-Lin划分启发式算法来生成3D包装器,该包装器可以最大限度地减少测试时间,最大限度地重用跨测试模式的路由资源,并允许不同测试模式下不同的TAM总线宽度。平均而言,我们算法的两个变体重用了93%和92%的测试包装器连接,而交付的测试时间仅比最小值高0.06%和0.32%。
{"title":"Designing 3D test wrappers for pre-bond and post-bond test of 3D embedded cores","authors":"D. L. Lewis, Shreepad Panth, Xin Zhao, S. Lim, H. Lee","doi":"10.1109/ICCD.2011.6081381","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081381","url":null,"abstract":"3D integration is a promising new technology for tightly integrating multiple active silicon layers into a single chip stack. Both the integration of heterogeneous tiers and the partitioning of functional units across tiers leads to significant improvements in functionality, area, performance, and power consumption. Managing the complexity of 3D design is a significant challenge that will require a system-on-chip approach, but the application of SOC design to 3D necessitates extensions to current test methodology. In this paper, we propose extending test wrappers, a popular SOC DFT technique, into the third dimension. We develop an algorithm employing the Best Fit Decreasing and Kernighan-Lin Partitioning heuristics to produce 3D wrappers that minimize test time, maximize reuse of routing resources across test modes, and allow for different TAM bus widths in different test modes. On average the two variants of our algorithm reuse 93% and 92% of the test wrapper wires while delivering test times of just 0.06% and 0.32% above the minimum.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129282026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Fast and compact binary-to-BCD conversion circuits for decimal multiplication 用于十进制乘法的快速和紧凑的二进制到bcd转换电路
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081401
O. Al-Khaleel, Zakaria Al-Qudah, M. Al-khaleel, C. Papachristou, F. Wolff
Decimal arithmetic has received considerable attention recently due to its suitability for many financial and commercial applications. In particular, numerous algorithms have been recently proposed for decimal multiplication. A major approach to decimal multiplication shaped by these proposals is based on performing the decimal digit-by-digit multiplication in binary, converting the binary partial product back to decimal, and then adding the decimal partial products as appropriate to form the final product in decimal. With this approach, the efficiency of binary-to-BCD partial product conversion is critical for the efficiency of the overall multiplication process. A recently proposed algorithm for this conversion is based on splitting the binary partial product into two parts (i.e., two groups of bits), and then computing the contributions of the two parts to the partial BCD result in parallel. This paper proposes two new algorithms (Three-Four split and Four-Three split) based on this principle. We present our proposed architectures that implement these algorithms and compare them to existing algorithms. The synthesis results show that the Three-Four split algorithm runs 15%faster and occupies 26.1%less area than the best performing equivalent circuit found in the literature. Furthermore, the Four-Three split algorithm occupies 37.5% less area than the state of the art equivalent circuit.
十进制算术由于其适合许多金融和商业应用,最近受到了相当大的关注。特别是,最近提出了许多用于十进制乘法的算法。这些建议形成的十进制乘法的主要方法是基于在二进制中执行十进制逐位乘法,将二进制部分乘积转换回十进制,然后适当地将十进制部分乘积相加以形成十进制的最终乘积。使用这种方法,二进制到bcd部分乘积转换的效率对于整个乘法过程的效率至关重要。最近提出的一种转换算法是基于将二进制部分积分成两部分(即两组比特),然后并行计算这两部分对部分BCD结果的贡献。本文在此基础上提出了3 - 4分割和4 - 3分割两种新的分割算法。我们提出了实现这些算法的架构,并将它们与现有算法进行比较。综合结果表明,与文献中性能最好的等效电路相比,该算法的运行速度提高了15%,占用的面积减少了26.1%。此外,4 - 3分割算法占用的面积比目前最先进的等效电路少37.5%。
{"title":"Fast and compact binary-to-BCD conversion circuits for decimal multiplication","authors":"O. Al-Khaleel, Zakaria Al-Qudah, M. Al-khaleel, C. Papachristou, F. Wolff","doi":"10.1109/ICCD.2011.6081401","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081401","url":null,"abstract":"Decimal arithmetic has received considerable attention recently due to its suitability for many financial and commercial applications. In particular, numerous algorithms have been recently proposed for decimal multiplication. A major approach to decimal multiplication shaped by these proposals is based on performing the decimal digit-by-digit multiplication in binary, converting the binary partial product back to decimal, and then adding the decimal partial products as appropriate to form the final product in decimal. With this approach, the efficiency of binary-to-BCD partial product conversion is critical for the efficiency of the overall multiplication process. A recently proposed algorithm for this conversion is based on splitting the binary partial product into two parts (i.e., two groups of bits), and then computing the contributions of the two parts to the partial BCD result in parallel. This paper proposes two new algorithms (Three-Four split and Four-Three split) based on this principle. We present our proposed architectures that implement these algorithms and compare them to existing algorithms. The synthesis results show that the Three-Four split algorithm runs 15%faster and occupies 26.1%less area than the best performing equivalent circuit found in the literature. Furthermore, the Four-Three split algorithm occupies 37.5% less area than the state of the art equivalent circuit.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128577438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Energy aware task mapping algorithm for heterogeneous MPSoC based architectures 基于异构MPSoC架构的能量感知任务映射算法
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081444
A. Hussien, A. Eltawil, R. Amin, Jim Martin
Energy Management for multi-mode Software Defined Radio (SDR) systems remains a daunting challenge. In this paper, we focus on the issue of task allocation for multi-processor based systems with hybrid processing resources that can be reconfigured. With the objective of minimizing energy, we propose a fast, energy aware static task mapping heuristic to minimize the average overall energy consumption. Simulation results show that the proposed heuristic is capable of achieving results that are within 20% of the optimal solution while providing orders of magnitude speedup in processing time.
多模软件定义无线电(SDR)系统的能量管理仍然是一个艰巨的挑战。在本文中,我们重点研究了具有可重新配置的混合处理资源的多处理器系统的任务分配问题。以最小化能量为目标,提出了一种快速、能量感知的静态任务映射启发式算法,以最小化平均总能耗。仿真结果表明,提出的启发式算法能够在处理时间上提供数量级的加速的同时,实现在最优解的20%以内的结果。
{"title":"Energy aware task mapping algorithm for heterogeneous MPSoC based architectures","authors":"A. Hussien, A. Eltawil, R. Amin, Jim Martin","doi":"10.1109/ICCD.2011.6081444","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081444","url":null,"abstract":"Energy Management for multi-mode Software Defined Radio (SDR) systems remains a daunting challenge. In this paper, we focus on the issue of task allocation for multi-processor based systems with hybrid processing resources that can be reconfigured. With the objective of minimizing energy, we propose a fast, energy aware static task mapping heuristic to minimize the average overall energy consumption. Simulation results show that the proposed heuristic is capable of achieving results that are within 20% of the optimal solution while providing orders of magnitude speedup in processing time.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121087526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A memristor-based memory cell using ambipolar operation 使用双极性操作的基于记忆器的存储单元
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081390
P. Junsangsri, F. Lombardi
This paper presents a novel memory cell consisting of a memristor and ambipolar transistors. Macroscopic models are utilized to characterize the operations of this memory cell. A detailed treatment of the two basic memory operations (write and read) with respect to memristor features is provided; particular, emphasis is devoted to the threshold characterization of the memristance and the on/off states. Extensive simulation results are provided to assess performance in terms of the write/read times, transistor scaling and power dissipation. The simulation results show that the proposed memory cell achieves superior performance compared with other memristor-based cells found in the technical literature.
本文提出了一种由忆阻器和双极晶体管组成的新型存储单元。宏观模型被用来描述这种记忆细胞的操作。提供了关于忆阻器特征的两种基本存储器操作(写和读)的详细处理;特别地,重点是致力于阈值表征的忆阻和开/关状态。广泛的仿真结果提供了评估性能方面的写/读时间,晶体管缩放和功耗。仿真结果表明,与技术文献中其他基于记忆阻器的存储单元相比,所提出的存储单元具有优越的性能。
{"title":"A memristor-based memory cell using ambipolar operation","authors":"P. Junsangsri, F. Lombardi","doi":"10.1109/ICCD.2011.6081390","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081390","url":null,"abstract":"This paper presents a novel memory cell consisting of a memristor and ambipolar transistors. Macroscopic models are utilized to characterize the operations of this memory cell. A detailed treatment of the two basic memory operations (write and read) with respect to memristor features is provided; particular, emphasis is devoted to the threshold characterization of the memristance and the on/off states. Extensive simulation results are provided to assess performance in terms of the write/read times, transistor scaling and power dissipation. The simulation results show that the proposed memory cell achieves superior performance compared with other memristor-based cells found in the technical literature.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121504883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Blue team red team approach to hardware trust assessment 蓝队红队硬件信任评估方法
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081410
Jeyavijayan Rajendran, V. Jyothi, R. Karri
Hardware security techniques are validated using fixed in-house methods. However, the effectiveness of such techniques in the field cannot be the same as the attacks are dynamic. A red team blue team approach mimics dynamic attack scenarios and thus can be used to validate such techniques by determining the effectiveness of a defense and identifying vulnerabilities in it. By following a red team blue team approach, we validated two trojan detection techniques namely, path delay measurement and ring oscillator frequency monitoring, in the Embedded Systems Challenge (ESC) 2010. In ESC, one team performed the blue team activities and eight other teams performed red team activities. The path delay measurement technique detected all the trojans. The ESC exposed a vulnerability in the RO-based technique which was exploited by the red teams causing some trojans to be undetected. Post ESC, we developed a technique to fix this vulnerability.
硬件安全技术使用固定的内部方法进行验证。然而,由于攻击是动态的,因此这些技术在该领域的有效性不可能相同。红队蓝队方法模拟动态攻击场景,因此可以通过确定防御的有效性和识别其中的漏洞来验证这些技术。通过遵循红队蓝队方法,我们在2010年嵌入式系统挑战赛(ESC)中验证了两种木马检测技术,即路径延迟测量和环形振荡器频率监测。在ESC中,一个团队执行蓝队活动,另外八个团队执行红队活动。路径延迟测量技术检测到所有的木马。ESC暴露了基于ro的技术中的一个漏洞,该漏洞被红队利用,导致一些木马未被检测到。在ESC之后,我们开发了一种技术来修复这个漏洞。
{"title":"Blue team red team approach to hardware trust assessment","authors":"Jeyavijayan Rajendran, V. Jyothi, R. Karri","doi":"10.1109/ICCD.2011.6081410","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081410","url":null,"abstract":"Hardware security techniques are validated using fixed in-house methods. However, the effectiveness of such techniques in the field cannot be the same as the attacks are dynamic. A red team blue team approach mimics dynamic attack scenarios and thus can be used to validate such techniques by determining the effectiveness of a defense and identifying vulnerabilities in it. By following a red team blue team approach, we validated two trojan detection techniques namely, path delay measurement and ring oscillator frequency monitoring, in the Embedded Systems Challenge (ESC) 2010. In ESC, one team performed the blue team activities and eight other teams performed red team activities. The path delay measurement technique detected all the trojans. The ESC exposed a vulnerability in the RO-based technique which was exploited by the red teams causing some trojans to be undetected. Post ESC, we developed a technique to fix this vulnerability.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132649513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Enhanced symbolic simulation of a round-robin arbiter 轮询仲裁器的增强符号模拟
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081383
Yongjian Li, Naiju Zeng, W. Hung, Xiaoyu Song
In this work, we present our results on formally verifying hardware design of round-robin arbiter which is the core component in many real network systems. Our approach is enhanced STE, which explores fully symbolic simulation for not only one round of round-robin arbitration, but also the sequential behaviors of the arbiter. Our experiments demonstrate that the enhanced STE specification for real-world hardware design can be finished automatically in a reasonable time and memory usage.
在这项工作中,我们对轮询仲裁器的硬件设计进行了形式化的验证,轮询仲裁器是许多实际网络系统中的核心组件。我们的方法是增强的STE,它不仅探索了一轮轮询仲裁的完全符号模拟,而且还探索了仲裁器的顺序行为。我们的实验表明,增强的STE规范可以在合理的时间和内存使用情况下自动完成实际硬件设计。
{"title":"Enhanced symbolic simulation of a round-robin arbiter","authors":"Yongjian Li, Naiju Zeng, W. Hung, Xiaoyu Song","doi":"10.1109/ICCD.2011.6081383","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081383","url":null,"abstract":"In this work, we present our results on formally verifying hardware design of round-robin arbiter which is the core component in many real network systems. Our approach is enhanced STE, which explores fully symbolic simulation for not only one round of round-robin arbitration, but also the sequential behaviors of the arbiter. Our experiments demonstrate that the enhanced STE specification for real-world hardware design can be finished automatically in a reasonable time and memory usage.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132347910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Simultaneous continual flow pipeline architecture 同时连续流管道结构
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081387
K. Jothi, Mageda Sharafeddine, Haitham Akkary
Since the introduction of the first industrial out-of-order superscalar processors in the 1990s, instruction buffers and cache sizes have kept increasing with every new generation of out-of-order cores. The motivation behind this continuous evolution has been performance of single-thread applications. Performance gains from larger instruction buffers and caches come at the expense of area, power, and complexity. We show that this is not the most energy efficient way to achieve performance. Instead, sizing the instruction buffers to the minimum size necessary for the common case of L1 data cache hits and using new latency-tolerant microarchitecture to handle loads that miss the L1 data cache, improves execution time and energy consumption on SpecCPU 2000 benchmarks by an average of 10% and 12% respectively, compared to a large superscalar baseline. Our non-blocking architecture outperforms other latency tolerant architectures, such as Continual Flow Pipelines, by up to 15% on the same SpecCPU 2000 benchmarks.
自从20世纪90年代引入第一个工业无序超标量处理器以来,指令缓冲区和缓存大小随着每一代无序核心的出现而不断增加。这种持续发展背后的动机是单线程应用程序的性能。更大的指令缓冲区和缓存带来的性能提升是以牺牲面积、功率和复杂性为代价的。我们表明,这不是实现性能的最节能的方式。相反,将指令缓冲区调整到L1数据缓存命中的常见情况所需的最小大小,并使用新的容忍延迟的微架构来处理错过L1数据缓存的负载,与大型超标量基线相比,SpecCPU 2000基准上的执行时间和能耗平均分别提高了10%和12%。在相同的SpecCPU 2000基准测试中,我们的非阻塞架构比其他延迟容忍架构(如continuous Flow Pipelines)的性能高出15%。
{"title":"Simultaneous continual flow pipeline architecture","authors":"K. Jothi, Mageda Sharafeddine, Haitham Akkary","doi":"10.1109/ICCD.2011.6081387","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081387","url":null,"abstract":"Since the introduction of the first industrial out-of-order superscalar processors in the 1990s, instruction buffers and cache sizes have kept increasing with every new generation of out-of-order cores. The motivation behind this continuous evolution has been performance of single-thread applications. Performance gains from larger instruction buffers and caches come at the expense of area, power, and complexity. We show that this is not the most energy efficient way to achieve performance. Instead, sizing the instruction buffers to the minimum size necessary for the common case of L1 data cache hits and using new latency-tolerant microarchitecture to handle loads that miss the L1 data cache, improves execution time and energy consumption on SpecCPU 2000 benchmarks by an average of 10% and 12% respectively, compared to a large superscalar baseline. Our non-blocking architecture outperforms other latency tolerant architectures, such as Continual Flow Pipelines, by up to 15% on the same SpecCPU 2000 benchmarks.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122402433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Using content-aware bitcells to reduce static energy dissipation 使用内容感知位元来减少静态能量耗散
Pub Date : 2011-10-09 DOI: 10.1109/ICCD.2011.6081375
Fahrettin Koc, O. Simsek, O. Ergin
Static energy dissipation is an increasing problem in contemporary processor design with shrinking feature sizes. Many schemes are proposed to cope with leakage in the literature ranging from using sleep transistors to lowering supply voltage. In this paper, we introduce a Conscious SRAM (CSRAM) design to lower static energy dissipation in the storage components of a processor. The proposed bitcell design adapts the body bias of its own transistors according to its contents. We show that the use of the proposed CSRAM cells results in significant reduction in the static energy dissipation of on-chip storage components without significant performance degradation. In order to reduce the area overhead introduced by the CSRAM we propose a simplified version of the cell at the circuit level. We also leverage the fact that the contents of adjacent bits of the stored values are highly dependent on each other, especially on the upper order bits of a value, and propose some architectural level solutions that lower the area overhead to as low as 7%.
随着特征尺寸的不断缩小,静态能量耗散是当代处理器设计中日益突出的问题。从使用睡眠晶体管到降低电源电压,文献中提出了许多解决漏电的方案。在本文中,我们介绍了一种有意识的SRAM (CSRAM)设计,以降低处理器存储组件的静态能量耗散。所提出的位单元设计根据其内容调整其自身晶体管的体偏置。我们表明,使用所提出的CSRAM电池可以显著降低片上存储组件的静态能量耗散,而不会显著降低性能。为了减少由CSRAM引入的面积开销,我们提出了一个简化版本的电路级单元。我们还利用了这样一个事实,即存储值的相邻位的内容高度依赖于彼此,特别是值的上阶位,并提出了一些架构级解决方案,将区域开销降低到7%。
{"title":"Using content-aware bitcells to reduce static energy dissipation","authors":"Fahrettin Koc, O. Simsek, O. Ergin","doi":"10.1109/ICCD.2011.6081375","DOIUrl":"https://doi.org/10.1109/ICCD.2011.6081375","url":null,"abstract":"Static energy dissipation is an increasing problem in contemporary processor design with shrinking feature sizes. Many schemes are proposed to cope with leakage in the literature ranging from using sleep transistors to lowering supply voltage. In this paper, we introduce a Conscious SRAM (CSRAM) design to lower static energy dissipation in the storage components of a processor. The proposed bitcell design adapts the body bias of its own transistors according to its contents. We show that the use of the proposed CSRAM cells results in significant reduction in the static energy dissipation of on-chip storage components without significant performance degradation. In order to reduce the area overhead introduced by the CSRAM we propose a simplified version of the cell at the circuit level. We also leverage the fact that the contents of adjacent bits of the stored values are highly dependent on each other, especially on the upper order bits of a value, and propose some architectural level solutions that lower the area overhead to as low as 7%.","PeriodicalId":354015,"journal":{"name":"2011 IEEE 29th International Conference on Computer Design (ICCD)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127676708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2011 IEEE 29th International Conference on Computer Design (ICCD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1