International Conference on Hardware/Software Codesign and System Synthesis最新文献

英文中文

Intra- and inter-processor hybrid performance modeling for MPSoC architectures MPSoC架构的处理器内和处理器间混合性能建模

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450156

Frank E. B. Ophelders, S. Chakraborty, H. Corporaal

The heterogeneity of modern MPSoC architectures, coupled with the increasing complexity of the applications mapped onto them has recently led to a lot of interest in hybrid performance modeling techniques. Here, the idea is to apply different modeling and analysis techniques to different subsystems/components of an architecture/application. Such hybrid techniques often turn out to be more efficient and accurate compared to relying on a single analysis technique for the entire system. However, the challenge associated with this approach is to combine the different analysis results effectively to obtain conservative performance estimates for the entire system. In this paper we study a hybrid scheme where certain system components are simulated (e.g. using instruction set simulators), whereas others are analyzed using a formal technique called Real-Time Calculus (RTC). The main novelty of our approach stems from our use of this hybrid technique even for multiple tasks mapped onto a single processing element. In contrast to this, previous approaches relied on either full simulation or RTC-based analysis for an entire architectural component (e.g. a processor or a bus). The techniques we develop in this paper therefore allow for both intra- and inter-processor hybrid performance modeling and show how the different analysis results can be combined to efficiently obtain tight performance estimates for complex MPSoC architectures. We demonstrate the usefulness of this approach using an MPEG-2 decoder application that is partitioned and mapped onto two processing elements connected by FIFO buffers.

现代MPSoC架构的异构性，加上映射到它们上的应用程序的日益复杂，最近引起了人们对混合性能建模技术的极大兴趣。这里的思想是对体系结构/应用程序的不同子系统/组件应用不同的建模和分析技术。与依赖于整个系统的单一分析技术相比，这种混合技术通常更加有效和准确。然而，与此方法相关的挑战是有效地组合不同的分析结果，以获得整个系统的保守性能估计。在本文中，我们研究了一种混合方案，其中某些系统组件被模拟(例如使用指令集模拟器)，而其他组件则使用称为实时微积分(RTC)的正式技术进行分析。我们的方法的主要新颖之处在于我们使用这种混合技术，甚至可以将多个任务映射到单个处理元素上。与此相反，以前的方法依赖于对整个体系结构组件(例如处理器或总线)的完全模拟或基于rtc的分析。因此，我们在本文中开发的技术允许处理器内和处理器间混合性能建模，并展示了如何将不同的分析结果结合起来，以有效地获得复杂MPSoC架构的严格性能估计。我们使用MPEG-2解码器应用程序演示了这种方法的实用性，该应用程序被划分并映射到由FIFO缓冲区连接的两个处理元素上。

{"title":"Intra- and inter-processor hybrid performance modeling for MPSoC architectures","authors":"Frank E. B. Ophelders, S. Chakraborty, H. Corporaal","doi":"10.1145/1450135.1450156","DOIUrl":"https://doi.org/10.1145/1450135.1450156","url":null,"abstract":"The heterogeneity of modern MPSoC architectures, coupled with the increasing complexity of the applications mapped onto them has recently led to a lot of interest in hybrid performance modeling techniques. Here, the idea is to apply different modeling and analysis techniques to different subsystems/components of an architecture/application. Such hybrid techniques often turn out to be more efficient and accurate compared to relying on a single analysis technique for the entire system. However, the challenge associated with this approach is to combine the different analysis results effectively to obtain conservative performance estimates for the entire system. In this paper we study a hybrid scheme where certain system components are simulated (e.g. using instruction set simulators), whereas others are analyzed using a formal technique called Real-Time Calculus (RTC). The main novelty of our approach stems from our use of this hybrid technique even for multiple tasks mapped onto a single processing element. In contrast to this, previous approaches relied on either full simulation or RTC-based analysis for an entire architectural component (e.g. a processor or a bus). The techniques we develop in this paper therefore allow for both intra- and inter-processor hybrid performance modeling and show how the different analysis results can be combined to efficiently obtain tight performance estimates for complex MPSoC architectures. We demonstrate the usefulness of this approach using an MPEG-2 decoder application that is partitioned and mapped onto two processing elements connected by FIFO buffers.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133216042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

System-level mitigation of WID leakage power variability using body-bias islands 使用体偏置岛的WID泄漏功率变异性的系统级缓解

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450197

S. Garg, Diana Marculescu

Adaptive Body Biasing (ABB) is a popularly used technique to mitigate the increasing impact of manufacturing process variations on leakage power dissipation. The efficacy of the ABB technique can be improved by partitioning a design into a number of "body-bias islands," each with its individual body-bias voltage. In this paper, we propose a system-level leakage variability mitigation framework to partition a multiprocessor system into body-bias islands at the processing element (PE) granularity at design time, and to optimally assign body-bias voltages to each island post-fabrication. As opposed to prior gate- and circuit-level partitioning techniques that constrain the global clock frequency of the system, we allow each island to run at a different speed and constrain only the relevant system performance metrics - in our case the execution deadlines. Experimental results show the efficacy of the proposed framework in reducing the mean and standard deviation of leakage power dissipation compared to a baseline system without ABB. At the same time, the proposed techniques provide significant runtime improvements over a previously proposed Monte-Carlo based technique while providing similar reductions in leakage power dissipation.

自适应体偏置(ABB)是一种广泛使用的技术，用于减轻制造工艺变化对泄漏功耗的影响。ABB技术的效率可以通过将设计划分为许多“体偏置岛”来提高，每个岛都有其单独的体偏置电压。在本文中，我们提出了一个系统级泄漏可变性缓解框架，以在设计时将多处理器系统按处理元素(PE)粒度划分为体偏岛，并在制造后为每个岛最佳地分配体偏电压。与先前限制系统全局时钟频率的门级和电路级分区技术相反，我们允许每个岛以不同的速度运行，并仅约束相关的系统性能指标——在我们的示例中是执行截止日期。实验结果表明，与没有ABB的基线系统相比，所提出的框架在降低泄漏功耗的平均值和标准差方面的有效性。同时，与先前提出的基于蒙特卡罗的技术相比，所提出的技术在运行时间上有显著改善，同时也提供了类似的泄漏功耗降低。

{"title":"System-level mitigation of WID leakage power variability using body-bias islands","authors":"S. Garg, Diana Marculescu","doi":"10.1145/1450135.1450197","DOIUrl":"https://doi.org/10.1145/1450135.1450197","url":null,"abstract":"Adaptive Body Biasing (ABB) is a popularly used technique to mitigate the increasing impact of manufacturing process variations on leakage power dissipation. The efficacy of the ABB technique can be improved by partitioning a design into a number of \"body-bias islands,\" each with its individual body-bias voltage. In this paper, we propose a system-level leakage variability mitigation framework to partition a multiprocessor system into body-bias islands at the processing element (PE) granularity at design time, and to optimally assign body-bias voltages to each island post-fabrication. As opposed to prior gate- and circuit-level partitioning techniques that constrain the global clock frequency of the system, we allow each island to run at a different speed and constrain only the relevant system performance metrics - in our case the execution deadlines. Experimental results show the efficacy of the proposed framework in reducing the mean and standard deviation of leakage power dissipation compared to a baseline system without ABB. At the same time, the proposed techniques provide significant runtime improvements over a previously proposed Monte-Carlo based technique while providing similar reductions in leakage power dissipation.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"571 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131444874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Concurrency emulation and analysis of parallel applications for multi-processor system-on-chip co-design 多处理器片上系统协同设计的并发仿真与并行应用分析

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450138

G. Beltrame, L. Fossati, D. Sciuto

This paper presents a novel technique for the modeling and the simulation of parallel applications for Multi-Processor Systems-on-Chip (MPSoCs). This technique consists of an application-transparent emulation of OS primitives, including task creation, scheduling, synchronization etc.; this emulation guarantees compatibility with any program compiled against the standard POSIX library, independently of the target OS. This methodology can be used to perform initial HW/SW partitioning and concurrent engineering of a given application, as it allows any software routine to be transparently emulated with SystemC modules. The proposed approach has been verified on a large set of multi-threaded benchmarks, with both POSIX Threads and OpenMP programming styles. Results show that our methodology enables (a) fast simulation of POSIX applications, (b) accurate analysis of multi-threaded applications, and (c) co-design and fast preliminary hardware-software partitioning.

本文提出了一种用于多处理器片上系统(mpsoc)并行应用建模和仿真的新技术。该技术包括对操作系统原语的应用程序透明仿真，包括任务创建、调度、同步等;此仿真保证与针对标准POSIX库编译的任何程序的兼容性，独立于目标操作系统。该方法可用于执行初始的硬件/软件分区和给定应用程序的并发工程，因为它允许使用SystemC模块透明地模拟任何软件例程。所提出的方法已经在大量多线程基准测试中得到验证，包括POSIX线程和OpenMP编程风格。结果表明，我们的方法能够(a)快速模拟POSIX应用程序，(b)准确分析多线程应用程序，以及(c)协同设计和快速初步硬件软件分区。

引用次数: 4

Design and defect tolerance beyond CMOS 超越CMOS的设计和缺陷容忍度

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450187

X. Hu, A. Khitun, K. Likharev, M. Niemier, M. Bao, Kang L. Wang

It is well recognized that novel computational models, devices and technologies are needed in order to sustain the remarkable advancement of CMOS-based VLSI circuits and systems. Regardless of the models, devices and technologies, any enhancement/replacement to CMOS must show significant gains in at least one of the key metrics (including speed, power and cost) for at least a subset of application domains currently employing CMOS circuits. In addition, effective defect tolerant techniques are a critical factor for the successful adoption of any new computing device due to the fact that nano-scale structures will have defect rates much higher than today's CMOS chips. The task of identifying application domains that could benefit the most from a new model/device/technology and ensuring that the resultant system meets functional requirements in the presence of defects requires synergistic efforts of physical scientists, and circuit and system design researchers. This paper contains a collection of three contributions-each focusing on one particular emergent technology-presenting a basic introduction on the technologies, some of their unique features in contrast with CMOS, potential application domains for these technologies, and new opportunities that they may bring forward in defect tolerance design. The contributions include both traditional and nontraditional state representations which use either electronic or magnetic interactions.

众所周知，为了维持基于cmos的VLSI电路和系统的显着进步，需要新的计算模型，器件和技术。无论采用何种型号、器件和技术，对CMOS的任何增强/替换都必须在至少一个关键指标(包括速度、功耗和成本)上取得显著进步，至少在目前使用CMOS电路的应用领域的一个子集上。此外，有效的缺陷容忍技术是成功采用任何新计算设备的关键因素，因为纳米级结构的缺陷率将比今天的CMOS芯片高得多。确定新模型/设备/技术的应用领域，并确保最终系统在存在缺陷的情况下满足功能需求，这一任务需要物理科学家、电路和系统设计研究人员的协同努力。本文包含三个贡献的集合-每个集中在一个特定的新兴技术-展示了对这些技术的基本介绍，它们与CMOS相比的一些独特特征，这些技术的潜在应用领域，以及它们可能在缺陷容限设计中提出的新机会。贡献包括使用电子或磁相互作用的传统和非传统状态表示。

{"title":"Design and defect tolerance beyond CMOS","authors":"X. Hu, A. Khitun, K. Likharev, M. Niemier, M. Bao, Kang L. Wang","doi":"10.1145/1450135.1450187","DOIUrl":"https://doi.org/10.1145/1450135.1450187","url":null,"abstract":"It is well recognized that novel computational models, devices and technologies are needed in order to sustain the remarkable advancement of CMOS-based VLSI circuits and systems. Regardless of the models, devices and technologies, any enhancement/replacement to CMOS must show significant gains in at least one of the key metrics (including speed, power and cost) for at least a subset of application domains currently employing CMOS circuits. In addition, effective defect tolerant techniques are a critical factor for the successful adoption of any new computing device due to the fact that nano-scale structures will have defect rates much higher than today's CMOS chips. The task of identifying application domains that could benefit the most from a new model/device/technology and ensuring that the resultant system meets functional requirements in the presence of defects requires synergistic efforts of physical scientists, and circuit and system design researchers.\u0000 This paper contains a collection of three contributions-each focusing on one particular emergent technology-presenting a basic introduction on the technologies, some of their unique features in contrast with CMOS, potential application domains for these technologies, and new opportunities that they may bring forward in defect tolerance design. The contributions include both traditional and nontraditional state representations which use either electronic or magnetic interactions.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125114649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Distributed and low-power synchronization architecture for embedded multiprocessors 嵌入式多处理器的分布式低功耗同步体系结构

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2008-10-19 DOI: 10.1145/1450135.1450153

Chenjie Yu, Peter Petrov

In this paper we present a framework for a distributed and very low-cost implementation of synchronization controllers and protocols for embedded multiprocessors. The proposed architecture effectively implements the queued-lock semantics in a completely distributed way. The proposed approach to synchronization implementation not only completely eliminates the overwhelming bus contention traffic when multiple cores compete for a synchronization variable, but also achieves very high energy efficiency as the local synchronization controller can efficiently determine, without any bus transactions or local cache spinning, the exact timing of when the lock is made available to the local processor. Application-specific information regarding synchronization variables in the local task is exploited in implementing the distributed synchronization protocol. The local synchronization controllers enable the system software or the thread library to implement various low-power policies, such as disabling the cache accesses or even completely powering down the local processor while waiting for a synchronization variable.

在本文中，我们提出了一个用于嵌入式多处理器的分布式和低成本的同步控制器和协议实现框架。提出的体系结构以完全分布式的方式有效地实现了队列锁语义。所提出的同步实现方法不仅完全消除了多核竞争同步变量时压倒性的总线争用流量，而且由于本地同步控制器可以在没有任何总线事务或本地缓存旋转的情况下有效地确定锁何时可供本地处理器使用，因此实现了非常高的能源效率。在实现分布式同步协议时，利用了本地任务中有关同步变量的特定于应用程序的信息。本地同步控制器使系统软件或线程库能够实现各种低功耗策略，例如在等待同步变量时禁用缓存访问，甚至完全关闭本地处理器的电源。

引用次数: 9

Fast Co-Simulation of Transformative Systems with OS Support 具有操作系统支持的变型系统的快速联合仿真

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2004-09-08 DOI: 10.1109/CODES+ISSS.2004.28

Zhengting He, A. Mok

Transformative applications are a class of dataflow computation characterized by iterative behavior. The problem of partitioning a transformative application specification to a set of available hardware (HW) and software (SW) processing elements (PEs) and derivation of a job execution order (scheduling) on them has been quite well studied, but the problem of obtaining fast simulation of these applications poses different constraints. In this paper, we propose an efficient framework for a symmetric multi-processor (SMP) simulation host to achieve fast HW/SW co-simulation for transformative applications, given the partition solutions and the derived schedulers. The framework overcomes the limitations in existing Linux SMP kernel and requires only a reasonable amount of modifications to it. We also present a heuristic algorithm which effectively assigns simulation tasks to the processors on the simulation host, considering both average job simulation time on each processor and other simulation overhead. Our experiments show that the algorithm is able to find satisfactory suboptimal solutions with very little computation time. Based on the task assignment solution, the simulation time can be reduced by 25% to 50% from the obvious but naive approach.

转换应用程序是一类以迭代行为为特征的数据流计算。将转换应用程序规范划分为一组可用的硬件(HW)和软件(SW)处理元素(pe)以及在它们上派生作业执行顺序(调度)的问题已经得到了很好的研究，但是获得这些应用程序的快速模拟的问题提出了不同的约束。在本文中，我们提出了一个对称多处理器(SMP)仿真主机的有效框架，以实现快速的硬件/软件协同仿真，用于变革性应用，给出了分区解决方案和派生的调度程序。该框架克服了现有Linux SMP内核的限制，只需要对其进行合理的修改。我们还提出了一种启发式算法，该算法可以有效地将仿真任务分配给仿真主机上的处理器，同时考虑每个处理器上的平均作业仿真时间和其他仿真开销。实验结果表明，该算法能够在很小的计算时间内找到满意的次优解。基于任务分配方案，仿真时间比明显但幼稚的方法减少了25%到50%。

{"title":"Fast Co-Simulation of Transformative Systems with OS Support","authors":"Zhengting He, A. Mok","doi":"10.1109/CODES+ISSS.2004.28","DOIUrl":"https://doi.org/10.1109/CODES+ISSS.2004.28","url":null,"abstract":"Transformative applications are a class of dataflow computation characterized by iterative behavior. The problem of partitioning a transformative application specification to a set of available hardware (HW) and software (SW) processing elements (PEs) and derivation of a job execution order (scheduling) on them has been quite well studied, but the problem of obtaining fast simulation of these applications poses different constraints. In this paper, we propose an efficient framework for a symmetric multi-processor (SMP) simulation host to achieve fast HW/SW co-simulation for transformative applications, given the partition solutions and the derived schedulers. The framework overcomes the limitations in existing Linux SMP kernel and requires only a reasonable amount of modifications to it. We also present a heuristic algorithm which effectively assigns simulation tasks to the processors on the simulation host, considering both average job simulation time on each processor and other simulation overhead. Our experiments show that the algorithm is able to find satisfactory suboptimal solutions with very little computation time. Based on the task assignment solution, the simulation time can be reduced by 25% to 50% from the obvious but naive approach.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115340643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Embedded systems education: how to teach the required skills? 嵌入式系统教育:如何教授所需的技能?

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2004-09-08 DOI: 10.1145/1016720.1016781

P. Marwedel, D. Gajski, Erwin De Kock, Hugo De Man, M. Sami, I. Söderquist

The goal of this panel is to contrast existing approaches to embedded system education with the needs in industry.

该小组的目标是将现有的嵌入式系统教育方法与工业需求进行对比。

引用次数: 7

Hardware synthesis from coarse-grained dataflow specification for fast HW/SW cosynthesis 从粗粒度数据流规范进行硬件合成，实现快速硬件/软件协同合成

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2004-09-08 DOI: 10.1145/1016720.1016730

Hyunuk Jung, S. Ha

This paper concerns automatic hardware synthesis from data flow graph (DFG) specification for fast HW/SW cosynthesis. A node in DFG represents a coarse grain block such as FIR and DCT and a port in a block may consume multiple data samples per invocation, which distinguishes our approach from behavioral synthesis and complicates the problem. In the presented design methodology, a dataflow graph with specified algorithm can be mapped to various hardware structures according to the resource allocation and schedule information. This simplifies the management of the area/performance tradeoff in hardware design and widens the design space of hardware implementation of a dataflow graph compared with the previous approaches. Through experiments with some examples, the usefulness of the proposed technique is demonstrated.

本文研究了基于数据流图(DFG)规范的硬件自动合成技术，以实现硬件/软件快速协同合成。DFG中的一个节点表示一个粗粒度块(如FIR和DCT)，一个块中的一个端口每次调用可能消耗多个数据样本，这将我们的方法与行为综合方法区别开来，并使问题复杂化。在该设计方法中，根据资源分配和调度信息，将具有特定算法的数据流图映射到各种硬件结构。与以前的方法相比，这简化了硬件设计中面积/性能权衡的管理，并扩大了数据流图硬件实现的设计空间。通过算例实验，验证了该方法的有效性。

引用次数: 1

Cellular Handset Technology System Requirements and Integration Trends 蜂窝手机技术系统需求与集成趋势

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2004-09-08 DOI: 10.1109/CODES+ISSS.2004.10

S. Mattisson

Summary form only given. Cellular handset technology system requirements and integration trends In ten years the cellular telephone has evolved from a tool for the professional to an indispensable consumer product with a very high market penetration. At the same time, the handset cost, weight, and standby time have been reduced by more than a factor often. These factors have been critical for the success story of the mobile phone. The technical aspects behind the rapid handset evolution are discussed. In particular, what advances in: the radio architecture, for example the zero-IF GSM receiver; the baseband (CMOS) technology; and the radio system design areas have meant for the reduction of size, weight, cost, and power consumption is discussed. Future challenges, like SW-DSP-digital-RF partitioning, linear multi-mode modulation with high linearity requirements, digital leakage issues, and power consumption limitations in multimedia handsets are discussed with future generation handsets in mind.

只提供摘要形式。十多年来，蜂窝电话已经从一个专业的工具发展成为一个不可缺少的消费产品，具有很高的市场渗透率。与此同时，手机的成本、重量和待机时间也往往降低了一个以上的因素。这些因素对手机的成功至关重要。讨论了手机快速发展背后的技术方面。特别是在无线电架构方面的进步，例如零中频GSM接收机;基带(CMOS)技术;并讨论了无线电系统设计领域对减小尺寸、重量、成本和功耗的意义。未来的挑战，如sw - dsp -数字射频分割，线性多模调制与高线性要求，数字泄漏问题，以及多媒体手机的功耗限制讨论了未来的一代手机。

引用次数: 0

Architectural versus physical solutions for on-chip communication challenges 芯片上通信挑战的体系结构与物理解决方案

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2003-10-01 DOI: 10.1145/944645.944665

D. Burger

The growing gap between transistor and global wire speeds in sub-100 nanometer technologies poses numerous challenges to computer architects and circuit designers. This challenge looks to be even more significant in far-future technologies such as molecular-scale wire transmission, whether using carbon nanotubes or quantum dots. While a fixed design scales as its area decreases with feature size reductions, future designs that use a constant area see rapidly increasing global latencies.Two approaches to address these latencies are (1) to use signaling and design techniques to reduce the actual latencies, and (2) to use architectural innovations to reduce the distance that signals must be propagated in the common case. In this talk, after an overview of the communication latency issue, I describe current research that aims to reduce the average distance communicated for processing and memory system signals. For processor designs, I will describe the Static Placement, Dynamic Issue (SPDI) execution model, which allows the compiler to place dependent instructions near one another, and which is being implemented in the TRIPS processor. I will also describe Non-Uniform Caches Access (NUCA) designs, which attempt to reduce average signal distance for cache accesses.

在亚100纳米技术中，晶体管和全球导线速度之间的差距越来越大，给计算机架构师和电路设计师带来了许多挑战。在遥远的未来，无论是使用碳纳米管还是量子点，这一挑战在分子尺度的电线传输等技术中显得更加重要。当固定设计的面积随着特征尺寸的减小而减小时，使用恒定面积的未来设计将会迅速增加全局延迟。解决这些延迟的两种方法是:(1)使用信令和设计技术来减少实际延迟，以及(2)使用架构创新来减少在一般情况下信号必须传播的距离。在本次演讲中，在概述了通信延迟问题之后，我描述了当前旨在减少处理和存储系统信号的平均通信距离的研究。对于处理器设计，我将描述静态放置，动态发布(Static Placement, Dynamic Issue, SPDI)执行模型，该模型允许编译器将相互依赖的指令放置在彼此附近，并且正在TRIPS处理器中实现。我还将描述非均匀缓存访问(NUCA)设计，它试图减少缓存访问的平均信号距离。

{"title":"Architectural versus physical solutions for on-chip communication challenges","authors":"D. Burger","doi":"10.1145/944645.944665","DOIUrl":"https://doi.org/10.1145/944645.944665","url":null,"abstract":"The growing gap between transistor and global wire speeds in sub-100 nanometer technologies poses numerous challenges to computer architects and circuit designers. This challenge looks to be even more significant in far-future technologies such as molecular-scale wire transmission, whether using carbon nanotubes or quantum dots. While a fixed design scales as its area decreases with feature size reductions, future designs that use a constant area see rapidly increasing global latencies.Two approaches to address these latencies are (1) to use signaling and design techniques to reduce the actual latencies, and (2) to use architectural innovations to reduce the distance that signals must be propagated in the common case. In this talk, after an overview of the communication latency issue, I describe current research that aims to reduce the average distance communicated for processing and memory system signals. For processor designs, I will describe the Static Placement, Dynamic Issue (SPDI) execution model, which allows the compiler to place dependent instructions near one another, and which is being implemented in the TRIPS processor. I will also describe Non-Uniform Caches Access (NUCA) designs, which attempt to reduce average signal distance for cache accesses.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134309496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Conference on Hardware/Software Codesign and System Synthesis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀