首页 > 最新文献

2008 IEEE International Conference on Computer Design最新文献

英文 中文
Application Specific Instruction set processor specialized for block motion estimation 用于块运动估计的专用指令集处理器
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751872
Marc-André Daigneault, J. Langlois, J. David
This paper presents a novel application specific instruction set processor specialized for block motion estimation. The proposed architecture includes an efficient register file system in terms of data reuse and parallel processing. Performances and area costs are presented for different levels of parallelism and register file dimensions. Various FPGA implementations of the architecture are further studied in order to present the most important factors affecting performance and hardware resource utilization. The proposed instruction extension block architecture enables acceleration by 3 orders of magnitude for full-search block matching algorithms.
本文提出了一种新的用于块运动估计的专用指令集处理器。该体系结构在数据重用和并行处理方面包括一个高效的寄存器文件系统。给出了不同并行度和寄存器文件尺寸的性能和面积开销。进一步研究了该体系结构的各种FPGA实现,以展示影响性能和硬件资源利用率的最重要因素。所提出的指令扩展块架构使全搜索块匹配算法的加速速度提高了3个数量级。
{"title":"Application Specific Instruction set processor specialized for block motion estimation","authors":"Marc-André Daigneault, J. Langlois, J. David","doi":"10.1109/ICCD.2008.4751872","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751872","url":null,"abstract":"This paper presents a novel application specific instruction set processor specialized for block motion estimation. The proposed architecture includes an efficient register file system in terms of data reuse and parallel processing. Performances and area costs are presented for different levels of parallelism and register file dimensions. Various FPGA implementations of the architecture are further studied in order to present the most important factors affecting performance and hardware resource utilization. The proposed instruction extension block architecture enables acceleration by 3 orders of magnitude for full-search block matching algorithms.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134630833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Quantifying the energy efficiency of coordinated micro-architectural adaptation for multimedia workloads 量化多媒体工作负载的协调微架构适应的能源效率
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751920
Shrirang M. Yardi, M. Hsiao
Adaptive micro-architectures aim to achieve greater energy efficiency by dynamically allocating computing resources to match the workload performance. The decisions of when to adapt (temporal dimension) and what to adapt (spatial dimension) are taken by a control algorithm based on an analysis of the power/performance tradeoffs in both dimensions. We perform a rigorous analysis to quantify the energy efficiency limits of fine-grained temporal and coordinated spatial adaptation of multiple architectural resources by casting the control algorithm as a constrained optimization problem. Our study indicates that coordinated adaptation can potentially improve energy efficiency by up to 60% as compared to static architectures and by up to 33% over algorithms that adapt resources in isolation. We also analyze synergistic application of coarse and fine grained adaptation and find modest improvements of up to 18% over optimized dynamic voltage/frequency scaling. Finally, we analyze several previous control algorithms to understand the underlying reasons for their inefficiency.
自适应微架构旨在通过动态分配计算资源以匹配工作负载性能来实现更高的能源效率。何时适应(时间维度)和适应什么(空间维度)的决策由控制算法根据对两个维度中的功率/性能权衡的分析做出。通过将控制算法视为约束优化问题,我们进行了严格的分析,量化了多个建筑资源的细粒度时间和协调空间适应的能效限制。我们的研究表明,与静态架构相比,协调适应可以潜在地将能源效率提高高达60%,比孤立地适应资源的算法提高高达33%。我们还分析了粗粒度和细粒度自适应的协同应用,发现与优化的动态电压/频率缩放相比,改进幅度可达18%。最后,我们分析了几种以前的控制算法,以了解其低效率的潜在原因。
{"title":"Quantifying the energy efficiency of coordinated micro-architectural adaptation for multimedia workloads","authors":"Shrirang M. Yardi, M. Hsiao","doi":"10.1109/ICCD.2008.4751920","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751920","url":null,"abstract":"Adaptive micro-architectures aim to achieve greater energy efficiency by dynamically allocating computing resources to match the workload performance. The decisions of when to adapt (temporal dimension) and what to adapt (spatial dimension) are taken by a control algorithm based on an analysis of the power/performance tradeoffs in both dimensions. We perform a rigorous analysis to quantify the energy efficiency limits of fine-grained temporal and coordinated spatial adaptation of multiple architectural resources by casting the control algorithm as a constrained optimization problem. Our study indicates that coordinated adaptation can potentially improve energy efficiency by up to 60% as compared to static architectures and by up to 33% over algorithms that adapt resources in isolation. We also analyze synergistic application of coarse and fine grained adaptation and find modest improvements of up to 18% over optimized dynamic voltage/frequency scaling. Finally, we analyze several previous control algorithms to understand the underlying reasons for their inefficiency.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115680189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Custom rotary clock router 定制旋转时钟路由器
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751849
V. Honkote, B. Taskin
Timing closure and power envelopes for contemporary multi-core chips with high speed clock networks make the clock distribution design a challenging task. Resonant rotary clocking is a novel clocking technology for multi-gigahertz rate clock generation that provides minimal power dissipation. Rotary clocking implementations can easily provide independent synchronization of multiple cores as well. The traditional rotary clock design involves a regular array topology of oscillatory rings. In this paper, the rotary clock networks are designed and implemented using a custom ring topology. Custom ring topologies are advantageous as they reduce the total tapping wirelength for the registers tapping onto the oscillatory rings. A maze router based algorithm is developed for the implementation of custom topology rotary rings. In experiments performed on UCLA IBM R1-R5 benchmark circuits with the Elmore delay model, an improvement of 11.04% for register tapping wirelength is achieved on average.
当代多核芯片高速时钟网络的时序封闭和电源封装使得时钟分配设计成为一项具有挑战性的任务。谐振旋转时钟是一种新型的多千兆赫频率时钟产生技术,提供了最小的功耗。旋转时钟实现也可以很容易地提供多核的独立同步。传统的旋转时钟设计涉及振荡环的规则阵列拓扑结构。本文采用自定义环拓扑结构设计并实现了旋转时钟网络。自定义环拓扑是有利的,因为它们减少了敲入振荡环的寄存器的总敲击声长。提出了一种基于迷宫路由器的自定义拓扑旋转环实现算法。采用Elmore延迟模型在UCLA IBM R1-R5基准电路上进行实验,平均提高了11.04%的寄存器分接长度。
{"title":"Custom rotary clock router","authors":"V. Honkote, B. Taskin","doi":"10.1109/ICCD.2008.4751849","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751849","url":null,"abstract":"Timing closure and power envelopes for contemporary multi-core chips with high speed clock networks make the clock distribution design a challenging task. Resonant rotary clocking is a novel clocking technology for multi-gigahertz rate clock generation that provides minimal power dissipation. Rotary clocking implementations can easily provide independent synchronization of multiple cores as well. The traditional rotary clock design involves a regular array topology of oscillatory rings. In this paper, the rotary clock networks are designed and implemented using a custom ring topology. Custom ring topologies are advantageous as they reduce the total tapping wirelength for the registers tapping onto the oscillatory rings. A maze router based algorithm is developed for the implementation of custom topology rotary rings. In experiments performed on UCLA IBM R1-R5 benchmark circuits with the Elmore delay model, an improvement of 11.04% for register tapping wirelength is achieved on average.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124721054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Fault tolerant Four-State Logic by using Self-Healing Cells 基于自愈细胞的容错四态逻辑
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751832
T. Panhofer, W. Friesenbichler, M. Delvai
The trend towards higher integration and faster operating speed leads to decreasing feature sizes and lower supply voltages in modern integrated circuits. These properties make the circuits more error-prone, requiring a fault tolerant implementation for applications demanding high reliability, e.g. space missions. In previous work we presented a concept how to obtain fault tolerant digital circuits by using asynchronous four-state logic (FSL). This type of logic already exhibits a high degree of fault tolerance where most faults simply halt the circuit (deadlock). The remaining types of faults are handled by temporal redundancy. Adding a deadlock detection unit and introducing the concept of self-healing cells (SHCs) leads to a highly reliable circuit that is able to tolerate even multiple faults. However our experiments revealed that some specific fault constellations neither cause a deadlock nor are they detected by a redundant calculation. We present two improved ways of error detection, which allow to capture even these types of faults. Further, a comparison between the size of an SHC and the achieved fault tolerance wrt. multiple faults is performed.
现代集成电路的集成度越来越高,运行速度越来越快,特征尺寸越来越小,电源电压越来越低。这些特性使电路更容易出错,需要容错实现要求高可靠性的应用,例如空间任务。在以前的工作中,我们提出了一种利用异步四态逻辑(FSL)获得容错数字电路的概念。这种类型的逻辑已经显示出高度的容错性,大多数故障只是使电路停止(死锁)。其余类型的故障由时间冗余处理。添加死锁检测单元并引入自愈细胞(shc)的概念,可以实现高可靠的电路,甚至可以容忍多个故障。然而,我们的实验表明,一些特定的故障星座既不会引起死锁,也不会被冗余计算检测到。我们提出了两种改进的错误检测方法,它们甚至可以捕获这些类型的错误。此外,还比较了SHC的大小和实现的容错能力。出现多个故障。
{"title":"Fault tolerant Four-State Logic by using Self-Healing Cells","authors":"T. Panhofer, W. Friesenbichler, M. Delvai","doi":"10.1109/ICCD.2008.4751832","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751832","url":null,"abstract":"The trend towards higher integration and faster operating speed leads to decreasing feature sizes and lower supply voltages in modern integrated circuits. These properties make the circuits more error-prone, requiring a fault tolerant implementation for applications demanding high reliability, e.g. space missions. In previous work we presented a concept how to obtain fault tolerant digital circuits by using asynchronous four-state logic (FSL). This type of logic already exhibits a high degree of fault tolerance where most faults simply halt the circuit (deadlock). The remaining types of faults are handled by temporal redundancy. Adding a deadlock detection unit and introducing the concept of self-healing cells (SHCs) leads to a highly reliable circuit that is able to tolerate even multiple faults. However our experiments revealed that some specific fault constellations neither cause a deadlock nor are they detected by a redundant calculation. We present two improved ways of error detection, which allow to capture even these types of faults. Further, a comparison between the size of an SHC and the achieved fault tolerance wrt. multiple faults is performed.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123934769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Design of application-specific 3D Networks-on-Chip architectures 设计特定应用的3D片上网络架构
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751853
Shan Yan, Bill Lin
The increasing viability of three dimensional (3D) silicon integration technology has opened new opportunities for chip design innovations, including the prospect of extending emerging systems-on-chip (SoC) design paradigms based on networks-on-chip (NoC) interconnection architectures to 3D chip designs. In this paper, we consider the problem of designing application-specific 3D-NoC architectures that are optimized for a given application. We present novel 3D-NoC synthesis algorithms that make use of accurate power and delay models for 3D wiring with through-silicon vias. In particular, we present a very efficient 3D-NoC synthesis algorithm called ripup-reroute-and-router-merging (RRRM), that is based on a rip-up and reroute formulation for routing flows and a router merging procedure for network optimization. Experimental results on 3D-NoC design cases show that our synthesis results can on average achieve a 74% reduction in power consumption and a 17% reduction in hop count over regular 3D mesh implementations and a 52% reduction in power consumption and a 17% reduction in hop count over optimized 3D mesh implementations.
三维(3D)硅集成技术的日益增长的可行性为芯片设计创新开辟了新的机会,包括将基于片上网络(NoC)互连架构的新兴片上系统(SoC)设计范例扩展到3D芯片设计的前景。在本文中,我们考虑了设计针对给定应用进行优化的特定应用的3D-NoC架构的问题。我们提出了新颖的3D- noc合成算法,该算法利用精确的功率和延迟模型进行具有硅通孔的3D布线。特别是,我们提出了一种非常有效的3D-NoC合成算法,称为ripup-reroute-and-router- merge (RRRM),该算法基于路由流的撕裂和重路由公式以及网络优化的路由器合并过程。在3D- noc设计案例上的实验结果表明,我们的合成结果比常规3D网格实现平均降低74%的功耗和17%的跳数,比优化的3D网格实现平均降低52%的功耗和17%的跳数。
{"title":"Design of application-specific 3D Networks-on-Chip architectures","authors":"Shan Yan, Bill Lin","doi":"10.1109/ICCD.2008.4751853","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751853","url":null,"abstract":"The increasing viability of three dimensional (3D) silicon integration technology has opened new opportunities for chip design innovations, including the prospect of extending emerging systems-on-chip (SoC) design paradigms based on networks-on-chip (NoC) interconnection architectures to 3D chip designs. In this paper, we consider the problem of designing application-specific 3D-NoC architectures that are optimized for a given application. We present novel 3D-NoC synthesis algorithms that make use of accurate power and delay models for 3D wiring with through-silicon vias. In particular, we present a very efficient 3D-NoC synthesis algorithm called ripup-reroute-and-router-merging (RRRM), that is based on a rip-up and reroute formulation for routing flows and a router merging procedure for network optimization. Experimental results on 3D-NoC design cases show that our synthesis results can on average achieve a 74% reduction in power consumption and a 17% reduction in hop count over regular 3D mesh implementations and a 52% reduction in power consumption and a 17% reduction in hop count over optimized 3D mesh implementations.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124170662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 62
Safe clocking register assignment in datapath synthesis 数据路径合成中的安全时钟寄存器分配
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751850
Keisuke Inoue, M. Kaneko, T. Iwagaki
For recent and future nanometer-technology VLSIs, static and dynamic delay variations become a serious problem. In many cases, the hold constraint, as well as the setup constraint, becomes critical for latching a correct signal under delay variations. While the timing violation due to the fail of the setup constraint can be fixed by tuning a clock frequency or using a delayed latch, the timing violation due to the fail of the hold constraint cannot be fixed by those methods in general. Our approach to delay variations (in particular, the hold constraint) proposed in this paper is a novel register assignment strategy in high-level synthesis, which guarantees safe clocking by contra-data-direction (CDD) clocking. After the formulation of this new register assignment problem, we prove NP-hardness of the problem, and then derive an integer linear programming formulation for the problem. The proposed method receives a scheduled data flow graph, and generates a datapath having (1) robustness against delay variations, which is ensured by CDD-based register assignment, and (2) the minimum possible number of registers. Experimental results show the effectiveness of the proposed method for some benchmark circuits.
对于最近和未来的纳米技术vlsi,静态和动态延迟变化将成为一个严重的问题。在许多情况下,保持约束以及设置约束对于在延迟变化下锁存正确的信号至关重要。虽然由于设置约束失败而导致的时间冲突可以通过调整时钟频率或使用延迟锁存器来修复,但由于保持约束失败而导致的时间冲突通常不能通过这些方法来修复。本文提出的延迟变化方法(特别是保持约束)是一种新的高级综合寄存器分配策略,它通过反向数据方向(CDD)时钟保证了安全时钟。在给出新的寄存器分配问题的公式后,证明了该问题的np -硬度,并推导出该问题的整数线性规划公式。该方法接收一个预定的数据流图,并生成一个具有(1)对延迟变化的鲁棒性的数据路径,这是由基于cdd的寄存器分配保证的;(2)尽可能少的寄存器数。实验结果表明了该方法对一些基准电路的有效性。
{"title":"Safe clocking register assignment in datapath synthesis","authors":"Keisuke Inoue, M. Kaneko, T. Iwagaki","doi":"10.1109/ICCD.2008.4751850","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751850","url":null,"abstract":"For recent and future nanometer-technology VLSIs, static and dynamic delay variations become a serious problem. In many cases, the hold constraint, as well as the setup constraint, becomes critical for latching a correct signal under delay variations. While the timing violation due to the fail of the setup constraint can be fixed by tuning a clock frequency or using a delayed latch, the timing violation due to the fail of the hold constraint cannot be fixed by those methods in general. Our approach to delay variations (in particular, the hold constraint) proposed in this paper is a novel register assignment strategy in high-level synthesis, which guarantees safe clocking by contra-data-direction (CDD) clocking. After the formulation of this new register assignment problem, we prove NP-hardness of the problem, and then derive an integer linear programming formulation for the problem. The proposed method receives a scheduled data flow graph, and generates a datapath having (1) robustness against delay variations, which is ensured by CDD-based register assignment, and (2) the minimum possible number of registers. Experimental results show the effectiveness of the proposed method for some benchmark circuits.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125373118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Applying speculation techniques to implement functional units 运用推测技术来实现功能单元
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751843
Alberto A. Del Barrio, M. Molina, J. Mendias, Esther Andres Perez, R. Hermida, F. Tirado
This paper justifies the use of estimation and prediction of carries to increase the performance of functional units built with the replication of full adders while keeping a low area penalization. Adders and multipliers are the most representative modules in this group of functional units. The use of these design techniques allows the implementation of modules with performance improvements ranging from 20% to 50% with only an area overheads around 5%. These functional units are suitable for asynchronous circuits but they could also be introduced in synchronous circuits with speculative techniques. The basic idea consists in estimating the carry out from some parts of the functional units, allowing every part to operate independently and in parallel. These modules are connected to build bigger ones. Results from simulations show that for some applications it is possible to make predictions even more accurate that the bit-based estimation. Predictions have also the advantage they can be introduced in the multipliers design, whether estimators cannot. These predictions are similar to the ones used in the branch prediction in a processor.
本文证明了利用进位的估计和预测来提高由满加法器复制构建的功能单元的性能,同时保持低面积惩罚。加法器和乘法器是这组功能单元中最具代表性的模块。使用这些设计技术,模块的性能提升幅度在20%到50%之间,而面积开销仅为5%左右。这些功能单元适用于异步电路,但它们也可以通过推测技术引入同步电路。其基本思想在于估计功能单元的某些部分的执行情况,允许每个部分独立并行地操作。这些模块被连接起来建造更大的模块。模拟结果表明,对于某些应用程序,可以做出比基于位的估计更准确的预测。预测还有一个优点,它们可以被引入乘数设计中,而估计器则不能。这些预测类似于处理器中的分支预测中使用的预测。
{"title":"Applying speculation techniques to implement functional units","authors":"Alberto A. Del Barrio, M. Molina, J. Mendias, Esther Andres Perez, R. Hermida, F. Tirado","doi":"10.1109/ICCD.2008.4751843","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751843","url":null,"abstract":"This paper justifies the use of estimation and prediction of carries to increase the performance of functional units built with the replication of full adders while keeping a low area penalization. Adders and multipliers are the most representative modules in this group of functional units. The use of these design techniques allows the implementation of modules with performance improvements ranging from 20% to 50% with only an area overheads around 5%. These functional units are suitable for asynchronous circuits but they could also be introduced in synchronous circuits with speculative techniques. The basic idea consists in estimating the carry out from some parts of the functional units, allowing every part to operate independently and in parallel. These modules are connected to build bigger ones. Results from simulations show that for some applications it is possible to make predictions even more accurate that the bit-based estimation. Predictions have also the advantage they can be introduced in the multipliers design, whether estimators cannot. These predictions are similar to the ones used in the branch prediction in a processor.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125878468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Leveraging speculative architectures for run-time program validation 利用推测性架构进行运行时程序验证
Pub Date : 2008-10-01 DOI: 10.1145/2512456
Juan Carlos Martínez Santos, Yunsi Fei
Program execution can be tampered by malicious attackers through exploiting software vulnerabilities. Changing the program behavior by compromising control data and decision data has become the most serious threat to computer systems security. Although several hardware approaches have been presented to validate program execution, they mostly suffer great hardware area or poor ambiguity handling. In this paper, we propose a new hardware-based approach by leveraging the existing speculative architectures for run-time program validation. The on-chip branch target buffer (BTB) is utilized as a cache of the legitimate control flow transfers stored in a secure memory region. In addition, the BTB is extended to store the correct program path information. At each indirect branch site, the BTB is used to validate the decision history of conditional branches before it, and more information about the future decision path is fetched to monitor the execution path at run-time. Implementation of this approach is transparent to the upper operating system and programs. Thus, it is applicable to legacy code. Due to good code locality of the executable programs and effectiveness of branch prediction, the frequency of run-time control flow validations against the secure off-chip memory is low. Our experimental results show a negligible performance penalty and small storage overhead with ambiguity reduced.
恶意攻击者可以利用软件漏洞篡改程序的执行。通过破坏控制数据和决策数据来改变程序行为已成为计算机系统安全的最严重威胁。虽然已经提出了几种硬件方法来验证程序的执行,但它们大多存在很大的硬件面积或较差的歧义处理。在本文中,我们提出了一种新的基于硬件的方法,利用现有的推测架构进行运行时程序验证。片上分支目标缓冲区(BTB)被用作存储在安全内存区域的合法控制流传输的缓存。此外,扩展了BTB以存储正确的程序路径信息。在每个间接分支站点,BTB用于验证条件分支之前的决策历史,并获取有关未来决策路径的更多信息,以便在运行时监视执行路径。这种方法的实现对上层操作系统和程序是透明的。因此,它适用于遗留代码。由于可执行程序的良好代码局部性和分支预测的有效性,针对安全片外存储器的运行时控制流验证的频率很低。我们的实验结果表明,在减少歧义的情况下,性能损失可以忽略不计,存储开销很小。
{"title":"Leveraging speculative architectures for run-time program validation","authors":"Juan Carlos Martínez Santos, Yunsi Fei","doi":"10.1145/2512456","DOIUrl":"https://doi.org/10.1145/2512456","url":null,"abstract":"Program execution can be tampered by malicious attackers through exploiting software vulnerabilities. Changing the program behavior by compromising control data and decision data has become the most serious threat to computer systems security. Although several hardware approaches have been presented to validate program execution, they mostly suffer great hardware area or poor ambiguity handling. In this paper, we propose a new hardware-based approach by leveraging the existing speculative architectures for run-time program validation. The on-chip branch target buffer (BTB) is utilized as a cache of the legitimate control flow transfers stored in a secure memory region. In addition, the BTB is extended to store the correct program path information. At each indirect branch site, the BTB is used to validate the decision history of conditional branches before it, and more information about the future decision path is fetched to monitor the execution path at run-time. Implementation of this approach is transparent to the upper operating system and programs. Thus, it is applicable to legacy code. Due to good code locality of the executable programs and effectiveness of branch prediction, the frequency of run-time control flow validations against the secure off-chip memory is low. Our experimental results show a negligible performance penalty and small storage overhead with ambiguity reduced.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129455481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
In-field NoC-based SoC testing with distributed test vector storage 现场基于noc的SoC测试与分布式测试向量存储
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751863
J. Lee, R. Mahapatra
The operational lifetimes of SoC and microprocessors face growing threats from technology scaling and increasing device temperature and power density. In-field (or on-line) testing of NoC-based SoC is an important technique in ensuring system integrity throughout this potentially shorter lifetime. Whether in-field testing is conducted concurrently with normal applications or executed in isolation, application intrusion must be minimized in order to maintain system availability. Specialized infrastructure IP have been proposed to manage on-line testing by scheduling tests and delivering test vectors to the various cores within the SoC from a centralized location. However, as the number of cores integrated into a single chip continues to increase, issuing test vectors from a centralized location is not a scalable solution. These increased distances that test vectors must travel have become a major concern for on-line testing because of its direct impact on application intrusion in terms of energy consumption, network load, and latency. In this paper, we apply a distributed storage technique to bound and minimize this distance, thereby minimizing network load, energy consumption, and test delivery latency across the entire network. Our experiments show that test delivery latency and energy consumption is reduced by approximately 90% for moderately sized NoC.
SoC和微处理器的运行寿命面临着技术扩展和器件温度和功率密度不断提高的威胁。基于noc的SoC的现场(或在线)测试是确保系统在可能较短的使用寿命内完整性的重要技术。无论现场测试是与正常应用程序并发进行还是单独执行,都必须将应用程序入侵最小化,以维护系统可用性。专门的基础设施IP已经提出,通过调度测试和从集中位置向SoC内的各种内核交付测试向量来管理在线测试。然而,随着集成到单个芯片中的核心数量不断增加,从集中位置发布测试向量并不是一个可扩展的解决方案。这些增加的测试向量必须移动的距离已经成为在线测试的主要关注点,因为它在能源消耗、网络负载和延迟方面对应用程序入侵有直接影响。在本文中,我们应用分布式存储技术来绑定和最小化这个距离,从而最小化整个网络的网络负载、能量消耗和测试交付延迟。我们的实验表明,对于中等大小的NoC,测试传递延迟和能耗降低了大约90%。
{"title":"In-field NoC-based SoC testing with distributed test vector storage","authors":"J. Lee, R. Mahapatra","doi":"10.1109/ICCD.2008.4751863","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751863","url":null,"abstract":"The operational lifetimes of SoC and microprocessors face growing threats from technology scaling and increasing device temperature and power density. In-field (or on-line) testing of NoC-based SoC is an important technique in ensuring system integrity throughout this potentially shorter lifetime. Whether in-field testing is conducted concurrently with normal applications or executed in isolation, application intrusion must be minimized in order to maintain system availability. Specialized infrastructure IP have been proposed to manage on-line testing by scheduling tests and delivering test vectors to the various cores within the SoC from a centralized location. However, as the number of cores integrated into a single chip continues to increase, issuing test vectors from a centralized location is not a scalable solution. These increased distances that test vectors must travel have become a major concern for on-line testing because of its direct impact on application intrusion in terms of energy consumption, network load, and latency. In this paper, we apply a distributed storage technique to bound and minimize this distance, thereby minimizing network load, energy consumption, and test delivery latency across the entire network. Our experiments show that test delivery latency and energy consumption is reduced by approximately 90% for moderately sized NoC.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124496628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Suitable cache organizations for a novel biomedical implant processor 一种新型生物医学植入处理器的合适缓存组织
Pub Date : 2008-10-01 DOI: 10.1109/ICCD.2008.4751921
C. Strydis
This paper evaluates various instruction- and data-cache organizations in terms of performance, power, energy and area on a suitably selected biomedical benchmark suite. The benchmark suite consists of compression, encryption and data-integrity algorithms as well as real implant applications, all executed on biomedical input datasets. Results are used to drive the (micro)architectural design of a novel microprocessor targeting microelectronic implants. Our profiling study has revealed a L1 instruction-cache of 8 KB size (when relaxed area constraints are imposed) and a L1 data-cache of 4 KB size, both structured as 2-way associative caches, as optimal organizations for the envisioned implant processor.
本文在适当选择的生物医学基准套件上评估各种指令和数据缓存组织的性能,功率,能源和面积。基准套件包括压缩、加密和数据完整性算法以及真实的植入应用程序,所有这些都在生物医学输入数据集上执行。结果用于驱动针对微电子植入物的新型微处理器的(微)架构设计。我们的分析研究揭示了L1指令缓存大小为8 KB(当施加宽松的区域约束时)和L1数据缓存大小为4 KB,两者都被结构为双向关联缓存,是设想的植入处理器的最佳组织。
{"title":"Suitable cache organizations for a novel biomedical implant processor","authors":"C. Strydis","doi":"10.1109/ICCD.2008.4751921","DOIUrl":"https://doi.org/10.1109/ICCD.2008.4751921","url":null,"abstract":"This paper evaluates various instruction- and data-cache organizations in terms of performance, power, energy and area on a suitably selected biomedical benchmark suite. The benchmark suite consists of compression, encryption and data-integrity algorithms as well as real implant applications, all executed on biomedical input datasets. Results are used to drive the (micro)architectural design of a novel microprocessor targeting microelectronic implants. Our profiling study has revealed a L1 instruction-cache of 8 KB size (when relaxed area constraints are imposed) and a L1 data-cache of 4 KB size, both structured as 2-way associative caches, as optimal organizations for the envisioned implant processor.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121918192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2008 IEEE International Conference on Computer Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1