首页 > 最新文献

2012 IEEE 30th International Conference on Computer Design (ICCD)最新文献

英文 中文
BIXBAR: A low cost solution to support dynamic link reconfiguration in networks on chip BIXBAR:支持片上网络动态链路重构的低成本解决方案
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378617
Pablo Abad Fidalgo, P. Prieto, Valentin Puente, J. Gregorio
Improving link utilization is a key aspect in interconnection network design. Reconfigurable-direction interrouter links optimize network resource utilization, which substantially increases the maximum achievable throughput. In the case of On-chip Networks, the short distance between adjacent routers makes feasible fast link arbitration, which makes dynamic link reconfiguration an attractive solution. In this paper we propose a low-cost router micro-architecture that is able to deal with reconfigurable links with a marginal cost over a conventional router. The key element of the proposal is a bidirectional crossbar, which enables reconfiguration of links, without significantly increasing router area and energy. The results obtained indicate that with this proposal, system performance could be improved, for some selected workloads, by up to 25% while energy-performance tradeoff is reduced by 20%, avoiding the additional costs entailed in other state-of-the-art routers capable of performing dynamic link reconfiguration.
提高链路利用率是互联网络设计的一个重要方面。可重构方向的路由器间链路优化了网络资源利用率,大大提高了可实现的最大吞吐量。在片上网络中,相邻路由器之间的距离较短,使得快速链路仲裁可行,这使得动态链路重构成为一种有吸引力的解决方案。在本文中,我们提出了一种低成本的路由器微架构,它能够以比传统路由器更低的边际成本处理可重构链路。该方案的关键元素是双向交叉棒,它可以在不显著增加路由器面积和能量的情况下重新配置链路。所获得的结果表明,对于某些选定的工作负载,采用该建议可以将系统性能提高25%,同时将能量性能折衷降低20%,从而避免了其他能够执行动态链路重新配置的最先进路由器所带来的额外成本。
{"title":"BIXBAR: A low cost solution to support dynamic link reconfiguration in networks on chip","authors":"Pablo Abad Fidalgo, P. Prieto, Valentin Puente, J. Gregorio","doi":"10.1109/ICCD.2012.6378617","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378617","url":null,"abstract":"Improving link utilization is a key aspect in interconnection network design. Reconfigurable-direction interrouter links optimize network resource utilization, which substantially increases the maximum achievable throughput. In the case of On-chip Networks, the short distance between adjacent routers makes feasible fast link arbitration, which makes dynamic link reconfiguration an attractive solution. In this paper we propose a low-cost router micro-architecture that is able to deal with reconfigurable links with a marginal cost over a conventional router. The key element of the proposal is a bidirectional crossbar, which enables reconfiguration of links, without significantly increasing router area and energy. The results obtained indicate that with this proposal, system performance could be improved, for some selected workloads, by up to 25% while energy-performance tradeoff is reduced by 20%, avoiding the additional costs entailed in other state-of-the-art routers capable of performing dynamic link reconfiguration.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124607825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interface design for synthesized structural hybrid microarchitectural simulators 综合结构混合微建筑模拟器界面设计
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378650
Zhuo Ruan, D. Penry
Computer designers rely upon near-cycle-accurate microarchitectural simulators to explore the design space of new systems. Hybrid simulators which offload simulation work onto FPGAs overcome the speed limitations of software-only simulators as systems become more complex, however, such simulators must be automatically synthesized or the time to design them becomes prohibitive. The performance of a hybrid simulator is significantly affected by how the interface between software and hardware is constructed. We characterize the design space of interfaces for synthesized structural hybrid microarchitectural simulators, provide implementations for several such interfaces, and determine the tradeoffs involved in choosing an efficient design candidate.
计算机设计师依靠近周期精确的微建筑模拟器来探索新系统的设计空间。随着系统变得越来越复杂,将仿真工作转移到fpga上的混合模拟器克服了纯软件模拟器的速度限制,然而,这种模拟器必须自动合成,否则设计它们的时间就会变得令人望而却步。软件和硬件之间的接口结构对混合模拟器的性能有很大的影响。我们描述了合成结构混合微架构模拟器的接口设计空间,提供了几个这样的接口的实现,并确定了选择一个有效的候选设计所涉及的权衡。
{"title":"Interface design for synthesized structural hybrid microarchitectural simulators","authors":"Zhuo Ruan, D. Penry","doi":"10.1109/ICCD.2012.6378650","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378650","url":null,"abstract":"Computer designers rely upon near-cycle-accurate microarchitectural simulators to explore the design space of new systems. Hybrid simulators which offload simulation work onto FPGAs overcome the speed limitations of software-only simulators as systems become more complex, however, such simulators must be automatically synthesized or the time to design them becomes prohibitive. The performance of a hybrid simulator is significantly affected by how the interface between software and hardware is constructed. We characterize the design space of interfaces for synthesized structural hybrid microarchitectural simulators, provide implementations for several such interfaces, and determine the tradeoffs involved in choosing an efficient design candidate.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127625548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A stochastic reconfigurable architecture for fault-tolerant computation with sequential logic 时序逻辑容错计算的随机可重构结构
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378656
Peng Li, Weikang Qian, D. Lilja
Computation performed on stochastic bit streams is less efficient than that based on a binary radix because of its long latency. However, for certain complex arithmetic operations, computation on stochastic bit streams can consume less energy and tolerate more soft errors. In addition, the latency issue could be solved by using a faster clock frequency or in combination with a parallel processing approach. To take advantage of this computing technique, previous work proposed a combinational logic-based reconfigurable architecture to perform complex arithmetic operations on stochastic streams of bits. In this paper, we enhance and extend this reconfigurable architecture using sequential logic. Compared to the previous approach, the proposed reconfigurable architecture takes less hardware area and consumes less energy, while achieving the same performance in terms of processing time and fault-tolerance.
在随机比特流上执行的计算由于其长延迟而不如基于二进制基数的计算效率高。然而,对于某些复杂的算术运算,在随机比特流上计算可以消耗更少的能量和容忍更多的软错误。此外,可以通过使用更快的时钟频率或与并行处理方法相结合来解决延迟问题。为了利用这种计算技术,先前的工作提出了一种基于组合逻辑的可重构架构,以对随机比特流执行复杂的算术运算。在本文中,我们使用顺序逻辑增强和扩展了这种可重构体系结构。与以前的方法相比,所提出的可重构架构占用更少的硬件面积和消耗更少的能量,同时在处理时间和容错性方面达到相同的性能。
{"title":"A stochastic reconfigurable architecture for fault-tolerant computation with sequential logic","authors":"Peng Li, Weikang Qian, D. Lilja","doi":"10.1109/ICCD.2012.6378656","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378656","url":null,"abstract":"Computation performed on stochastic bit streams is less efficient than that based on a binary radix because of its long latency. However, for certain complex arithmetic operations, computation on stochastic bit streams can consume less energy and tolerate more soft errors. In addition, the latency issue could be solved by using a faster clock frequency or in combination with a parallel processing approach. To take advantage of this computing technique, previous work proposed a combinational logic-based reconfigurable architecture to perform complex arithmetic operations on stochastic streams of bits. In this paper, we enhance and extend this reconfigurable architecture using sequential logic. Compared to the previous approach, the proposed reconfigurable architecture takes less hardware area and consumes less energy, while achieving the same performance in terms of processing time and fault-tolerance.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133828246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Distributed thermal-aware task scheduling for 3D Network-on-Chip 三维片上网络的分布式热感知任务调度
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378690
Yingnan Cui, Wei Zhang, Hao Yu
The development of 3D integration technology significantly improves the bandwidth of network-on-chip (NoC) system. However, the 3D technology-enabled high integration density also brings severe concerns of temperature increase, which may impair system reliability and degrade the performance. Task scheduling has been regarded as one effective approach in eliminating thermal hotspot without introducing hardware overhead. However, centralized thermal-aware task scheduling algorithms for 3D-NoC have been limited for incurring high computational complexity as the system scale increase. In this paper, we propose a distributed agent-based thermal-aware task scheduling algorithm for 3D-NoC which shows high scheduling efficiency and high scalability. Experimental results have shown that when compared to the centralized algorithms, our algorithm can achieve up to 13 °C reduction in peak temperature of the system without sacrificing performance.
三维集成技术的发展极大地提高了片上网络系统的带宽。然而,3D技术带来的高集成度也带来了严重的温度升高问题,这可能会影响系统的可靠性和性能。任务调度被认为是在不引入硬件开销的情况下消除热热点的一种有效方法。然而,随着系统规模的增加,集中式热感知任务调度算法的计算复杂度很高,因此受到了限制。本文提出了一种基于分布式智能体的3D-NoC热感知任务调度算法,该算法具有较高的调度效率和可扩展性。实验结果表明,与集中式算法相比,我们的算法可以在不牺牲性能的情况下将系统的峰值温度降低13°C。
{"title":"Distributed thermal-aware task scheduling for 3D Network-on-Chip","authors":"Yingnan Cui, Wei Zhang, Hao Yu","doi":"10.1109/ICCD.2012.6378690","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378690","url":null,"abstract":"The development of 3D integration technology significantly improves the bandwidth of network-on-chip (NoC) system. However, the 3D technology-enabled high integration density also brings severe concerns of temperature increase, which may impair system reliability and degrade the performance. Task scheduling has been regarded as one effective approach in eliminating thermal hotspot without introducing hardware overhead. However, centralized thermal-aware task scheduling algorithms for 3D-NoC have been limited for incurring high computational complexity as the system scale increase. In this paper, we propose a distributed agent-based thermal-aware task scheduling algorithm for 3D-NoC which shows high scheduling efficiency and high scalability. Experimental results have shown that when compared to the centralized algorithms, our algorithm can achieve up to 13 °C reduction in peak temperature of the system without sacrificing performance.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115096810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Task model suitable for dynamic load balancing of real-time applications in NoC-based MPSoCs 适合于基于noc的mpsoc实时应用动态负载均衡的任务模型
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378616
S. J. Filho, A. Aguiar, F. Magalhães, Oliver B. Longhi, Fabiano Hessel
Modern embedded systems implemented through Multiprocessor System-on-Chip (MPSoCs) benefit themselves from resources that were previously available solely in generalpurpose computers. Currently, these systems are able to provide more features at the cost of an increased design complexity. In this scenario, the applications' behaviour has changed. In the past, the majority of applications showed a static behaviour throughout their entire lifetime. Applications could be divided into tasks and mapped onto processing elements at design time. Currently, the applications' dynamic nature imposes that efficient dynamic load balancing techniques with different task mapping strategies must arise, although a fair static mapping still helps increasing the system overall performance. In this paper we present a task model suitable for dynamic load balancing of real-time applications with special support for Network-on-Chip (NoC)-based MPSoCs that aims to stabilize the system load throughout its lifetime. Results show a reduction in both system stabilization time (mean of 47.62%) and deadline misses (mean of 32.28%) for several benchmarks, compared to classic approaches which employ a centralized migration manager.
通过多处理器片上系统(mpsoc)实现的现代嵌入式系统受益于以前仅在通用计算机中可用的资源。目前,这些系统能够以增加设计复杂性为代价提供更多功能。在这个场景中,应用程序的行为发生了变化。在过去,大多数应用程序在其整个生命周期中都表现出静态行为。可以在设计时将应用程序划分为任务并映射到处理元素。目前,应用程序的动态性要求必须采用不同任务映射策略的高效动态负载平衡技术,尽管公平的静态映射仍然有助于提高系统的整体性能。本文提出了一种适用于实时应用动态负载平衡的任务模型,特别支持基于片上网络(NoC)的mpsoc,目的是在其整个生命周期内稳定系统负载。结果显示,与使用集中式迁移管理器的经典方法相比,在几个基准测试中,系统稳定时间(平均47.62%)和截止日期错过(平均32.28%)都减少了。
{"title":"Task model suitable for dynamic load balancing of real-time applications in NoC-based MPSoCs","authors":"S. J. Filho, A. Aguiar, F. Magalhães, Oliver B. Longhi, Fabiano Hessel","doi":"10.1109/ICCD.2012.6378616","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378616","url":null,"abstract":"Modern embedded systems implemented through Multiprocessor System-on-Chip (MPSoCs) benefit themselves from resources that were previously available solely in generalpurpose computers. Currently, these systems are able to provide more features at the cost of an increased design complexity. In this scenario, the applications' behaviour has changed. In the past, the majority of applications showed a static behaviour throughout their entire lifetime. Applications could be divided into tasks and mapped onto processing elements at design time. Currently, the applications' dynamic nature imposes that efficient dynamic load balancing techniques with different task mapping strategies must arise, although a fair static mapping still helps increasing the system overall performance. In this paper we present a task model suitable for dynamic load balancing of real-time applications with special support for Network-on-Chip (NoC)-based MPSoCs that aims to stabilize the system load throughout its lifetime. Results show a reduction in both system stabilization time (mean of 47.62%) and deadline misses (mean of 32.28%) for several benchmarks, compared to classic approaches which employ a centralized migration manager.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115051906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A 3D stacked high performance scalable architecture for 3D Fourier Transform 一种用于三维傅里叶变换的3D堆叠高性能可扩展架构
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378692
G. Voicu, M. Enachescu, S. Cotofana
This paper proposes and evaluates a novel high-performance systolic architecture for 3D Fourier Transform specially tailored for 3D stacking integration with Through Silicon Vias. Our cuboid-shaped systolic network of orthogonally connected processing elements makes use of the DFT algorithm to compute an N1×N2×N3-point 3D-FT with an asymptotic time complexity of O(N1+N2+N3) multiplications. When compared with state-of-the-art 3D-FFT implementation on the Anton machine, a physical synthesized implementation of our architecture on the same 90nm technology node achieves 7.73× and 5.88× speed improvement when computing 16×1 6×16 and 32×3 2×32 FT, respectively.
本文提出并评估了一种新型的高性能三维傅立叶变换收缩架构,该架构是专门为与硅通孔的三维堆叠集成而量身定制的。我们的正交连接处理单元的长方体收缩网络使用DFT算法来计算N1×N2×N3-point 3D-FT,其渐进时间复杂度为O(N1+N2+N3)次乘法。与安东机器上最先进的3D-FFT实现相比,在相同的90nm技术节点上,我们的架构的物理合成实现在计算16×1 6×16和32×3 2×32 FT时分别实现了7.73倍和5.88倍的速度提升。
{"title":"A 3D stacked high performance scalable architecture for 3D Fourier Transform","authors":"G. Voicu, M. Enachescu, S. Cotofana","doi":"10.1109/ICCD.2012.6378692","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378692","url":null,"abstract":"This paper proposes and evaluates a novel high-performance systolic architecture for 3D Fourier Transform specially tailored for 3D stacking integration with Through Silicon Vias. Our cuboid-shaped systolic network of orthogonally connected processing elements makes use of the DFT algorithm to compute an N<sub>1</sub>×N<sub>2</sub>×N<sub>3</sub>-point 3D-FT with an asymptotic time complexity of O(N<sub>1</sub>+N<sub>2</sub>+N<sub>3</sub>) multiplications. When compared with state-of-the-art 3D-FFT implementation on the Anton machine, a physical synthesized implementation of our architecture on the same 90nm technology node achieves 7.73× and 5.88× speed improvement when computing 16×1 6×16 and 32×3 2×32 FT, respectively.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132296266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parametric throughput analysis of scenario-aware dataflow graphs 场景感知数据流图的参数吞吐量分析
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378644
M. Damavandpeyma, S. Stuijk, M. Geilen, T. Basten, H. Corporaal
Scenario-aware dataflow graphs (SADFs) efficiently model dynamic applications. The throughput of an application is an important metric to determine the performance of the system. For example, the number of frames per second output by a video decoder should always stay above a threshold that determines the quality of the system. During design-space exploration (DSE) or run-time management (RTM), numerous throughput calculations have to be performed. Throughput calculations have to be performed as fast as possible. For synchronous dataflow graphs (SDFs), a technique exists that extracts throughput expressions from a parameterized SDF in which the execution time of the tasks (actors) is a function of some parameters. Evaluation of these expressions can be done in a negligible amount of time and provides the throughput for a specific set of parameter values. This technique is not applicable to SADFs. In this paper, we present a technique, based on Max-Plus automata, that finds throughput expressions for a parameterized SADF. Experimental evaluation shows that our technique can be applied to realistic applications. These results also show that our technique is better scalable and faster compared to the available parametric throughput analysis technique for SDFs.
场景感知数据流图(sadf)有效地为动态应用程序建模。应用程序的吞吐量是确定系统性能的重要指标。例如,视频解码器每秒输出的帧数应该始终保持在决定系统质量的阈值之上。在设计空间探索(DSE)或运行时管理(RTM)期间,必须执行大量吞吐量计算。必须尽可能快地执行吞吐量计算。对于同步数据流图(SDF),有一种技术可以从参数化的SDF中提取吞吐量表达式,其中任务(参与者)的执行时间是某些参数的函数。这些表达式的求值可以在可忽略不计的时间内完成,并为一组特定参数值提供吞吐量。这种技术不适用于sadf。在本文中,我们提出了一种基于Max-Plus自动机的技术,该技术可以找到参数化SADF的吞吐量表达式。实验结果表明,该技术可用于实际应用。这些结果还表明,与现有的sdf参数吞吐量分析技术相比,我们的技术具有更好的可扩展性和更快的速度。
{"title":"Parametric throughput analysis of scenario-aware dataflow graphs","authors":"M. Damavandpeyma, S. Stuijk, M. Geilen, T. Basten, H. Corporaal","doi":"10.1109/ICCD.2012.6378644","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378644","url":null,"abstract":"Scenario-aware dataflow graphs (SADFs) efficiently model dynamic applications. The throughput of an application is an important metric to determine the performance of the system. For example, the number of frames per second output by a video decoder should always stay above a threshold that determines the quality of the system. During design-space exploration (DSE) or run-time management (RTM), numerous throughput calculations have to be performed. Throughput calculations have to be performed as fast as possible. For synchronous dataflow graphs (SDFs), a technique exists that extracts throughput expressions from a parameterized SDF in which the execution time of the tasks (actors) is a function of some parameters. Evaluation of these expressions can be done in a negligible amount of time and provides the throughput for a specific set of parameter values. This technique is not applicable to SADFs. In this paper, we present a technique, based on Max-Plus automata, that finds throughput expressions for a parameterized SADF. Experimental evaluation shows that our technique can be applied to realistic applications. These results also show that our technique is better scalable and faster compared to the available parametric throughput analysis technique for SDFs.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132885535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Balancing performance and fault detection for GPGPU workloads GPGPU工作负载的性能均衡和故障检测
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378702
J. Backer, R. Karri
GPUs are increasingly being used for processing highly parallel scientific and high performance workloads. Such applications require correctness and accuracy of the computation. GPUs lack adequate support for detecting hardware faults that may lead to computation errors. We present a tunable fault detection scheme that allows one to balance GPU performance and fault checking by configuring the amount of resources to allocate for detection and the frequency of checking for faults.
gpu越来越多地被用于处理高度并行的科学和高性能工作负载。这种应用要求计算的正确性和准确性。gpu缺乏对可能导致计算错误的硬件故障的检测支持。我们提出了一种可调的故障检测方案,该方案通过配置分配用于检测的资源数量和检查故障的频率来平衡GPU性能和故障检查。
{"title":"Balancing performance and fault detection for GPGPU workloads","authors":"J. Backer, R. Karri","doi":"10.1109/ICCD.2012.6378702","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378702","url":null,"abstract":"GPUs are increasingly being used for processing highly parallel scientific and high performance workloads. Such applications require correctness and accuracy of the computation. GPUs lack adequate support for detecting hardware faults that may lead to computation errors. We present a tunable fault detection scheme that allows one to balance GPU performance and fault checking by configuring the amount of resources to allocate for detection and the frequency of checking for faults.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117013620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A spectral transform approach to stochastic circuits 随机电路的谱变换方法
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378658
Armin Alaghi, J. Hayes
Stochastic computing (SC) processes data in the form of long pseudo-random bit-streams denoting probabilities. Its key advantages are simple computational elements and high soft-error tolerance. Recent technology developments have revealed important new SC applications such as image processing and LDPC decoding. Despite its long history, SC still lacks a comprehensive design methodology; existing methods tend to be ad hoc and limited to a few arithmetic functions. We demonstrate a fundamental relation between stochastic circuits and spectral transforms. Based on this, we propose a transform approach to the analysis and synthesis of SC circuits. We illustrate the approach for a variety of basic combinational SC design problems, and show that the area cost associated with stochastic number generation can be significantly reduced.
随机计算(SC)以表示概率的长伪随机比特流的形式处理数据。其主要优点是计算单元简单,软误差容忍度高。最近的技术发展揭示了重要的新的SC应用,如图像处理和LDPC解码。尽管SC历史悠久,但它仍然缺乏全面的设计方法;现有的方法往往是特别的,并且仅限于一些算术函数。我们证明了随机电路和谱变换之间的基本关系。在此基础上,我们提出了一种转换方法来分析和合成SC电路。我们举例说明了各种基本组合SC设计问题的方法,并表明与随机数字生成相关的面积成本可以显着降低。
{"title":"A spectral transform approach to stochastic circuits","authors":"Armin Alaghi, J. Hayes","doi":"10.1109/ICCD.2012.6378658","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378658","url":null,"abstract":"Stochastic computing (SC) processes data in the form of long pseudo-random bit-streams denoting probabilities. Its key advantages are simple computational elements and high soft-error tolerance. Recent technology developments have revealed important new SC applications such as image processing and LDPC decoding. Despite its long history, SC still lacks a comprehensive design methodology; existing methods tend to be ad hoc and limited to a few arithmetic functions. We demonstrate a fundamental relation between stochastic circuits and spectral transforms. Based on this, we propose a transform approach to the analysis and synthesis of SC circuits. We illustrate the approach for a variety of basic combinational SC design problems, and show that the area cost associated with stochastic number generation can be significantly reduced.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117041405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
A PRET microarchitecture implementation with repeatable timing and competitive performance 具有可重复计时和竞争性性能的PRET微架构实现
Pub Date : 2012-09-30 DOI: 10.1109/ICCD.2012.6378622
Isaac Liu, J. Reineke, David Broman, Michael Zimmer, Edward A. Lee
We contend that repeatability of execution times is crucial to the validity of testing of real-time systems. However, computer architecture designs fail to deliver repeatable timing, a consequence of aggressive techniques that improve average-case performance. This paper introduces the Precision-Timed ARM (PTARM), a precision-timed (PRET) microarchitecture implementation that exhibits repeatable execution times without sacrificing performance. The PTARM employs a repeatable thread-interleaved pipeline with an exposed memory hierarchy, including a repeatable DRAM controller. Our benchmarks show an improved throughput compared to a single-threaded in-order five-stage pipeline, given sufficient parallelism in the software.
我们认为执行时间的可重复性对实时系统测试的有效性至关重要。然而,计算机体系结构设计无法提供可重复的计时,这是提高平均情况性能的激进技术的结果。本文介绍了精确计时ARM (PTARM),这是一种精确计时(PRET)微架构实现,它在不牺牲性能的情况下具有可重复的执行时间。PTARM采用可重复的线程交错管道,具有公开的内存层次结构,包括可重复的DRAM控制器。我们的基准测试显示,与单线程有序的五阶段管道相比,在软件中提供足够的并行性时,吞吐量有所提高。
{"title":"A PRET microarchitecture implementation with repeatable timing and competitive performance","authors":"Isaac Liu, J. Reineke, David Broman, Michael Zimmer, Edward A. Lee","doi":"10.1109/ICCD.2012.6378622","DOIUrl":"https://doi.org/10.1109/ICCD.2012.6378622","url":null,"abstract":"We contend that repeatability of execution times is crucial to the validity of testing of real-time systems. However, computer architecture designs fail to deliver repeatable timing, a consequence of aggressive techniques that improve average-case performance. This paper introduces the Precision-Timed ARM (PTARM), a precision-timed (PRET) microarchitecture implementation that exhibits repeatable execution times without sacrificing performance. The PTARM employs a repeatable thread-interleaved pipeline with an exposed memory hierarchy, including a repeatable DRAM controller. Our benchmarks show an improved throughput compared to a single-threaded in-order five-stage pipeline, given sufficient parallelism in the software.","PeriodicalId":313428,"journal":{"name":"2012 IEEE 30th International Conference on Computer Design (ICCD)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115329683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
期刊
2012 IEEE 30th International Conference on Computer Design (ICCD)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1