2013 International Symposium on System on Chip (SoC)最新文献

英文中文

Evaluating the scalability of test buses 评估测试总线的可伸缩性

2013 International Symposium on System on Chip (SoC)

Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675278

Alexandre M. Amory, Matheus T. Moreira, Ney Laert Vilar Calazans, F. Moraes, C. Lazzari, M. Lubaszewski

Intra-chip communication architectures evolved from buses to networks-on-chip, in order to provide design scalability and increased bandwidth. However, the predominant test architecture for SoCs is still based on buses. While this approach presents advantages, such as simple design and a mature set of automation tools, its scalability is questionable. This paper evaluates such aspect by synthesizing SoCs of different sizes (with more than 100 cores) to layout level and extracting accurate results in terms of wire length, capacitance, and delay. The results compare the wiring for test buses and for NoC links, indicating that these test buses have limited scalability (highly irregular wire lengths and long wires) and may not be suitable for testing future SoCs with hundreds of cores. Finally, we discuss advantages and drawbacks of some approaches proposed in the literature. This discussion might give directions towards new scalable SoC test architectural models.

片内通信架构从总线发展到片上网络，以提供设计可扩展性和更高的带宽。然而，soc的主要测试架构仍然是基于总线的。虽然这种方法有一些优点，比如设计简单和一组成熟的自动化工具，但是它的可伸缩性是有问题的。本文通过将不同尺寸(100芯以上)的soc综合到布局级别，并在导线长度、电容和延迟方面提取准确的结果，对这方面进行了评估。结果比较了测试总线和NoC链路的布线，表明这些测试总线具有有限的可扩展性(高度不规则的线长度和长线)，并且可能不适合测试具有数百核的未来soc。最后，我们讨论了文献中提出的一些方法的优缺点。这个讨论可能会给新的可扩展SoC测试架构模型指明方向。

引用次数: 0

A novel SAD architecture for variable block size motion estimation in HEVC video coding 一种用于HEVC视频编码中可变块大小运动估计的SAD结构

2013 International Symposium on System on Chip (SoC)

Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675269

Purnachand Nalluri, L. N. Alves, A. Navarro

Motion estimation (ME) is one of the critical and most time consuming tasks in video coding. The increase of block size to 64x64 and introduction of asymmetric motion partitioning (AMP) in HEVC makes variable block size motion estimation more complex and therefore requires specific hardware architecture for real time implementation. The ME process includes the calculation of SAD (Sum of Absolute Difference) of two blocks, the current and the reference blocks. The present paper proposes low complexity SAD (Sum of Absolute Difference) architecture for ME of HEVC video encoder, which is able to exploit and optimize parallelism at various levels. The proposed architecture was implemented in FPGA, and compared with other non-parallel SAD architectures. Synthesis results show that the proposed architecture takes fewer resources in FPGA when compared with results from non-parallel architectures and other contributions.

运动估计是视频编码中最关键、最耗时的任务之一。在HEVC中，将块大小增加到64x64以及引入非对称运动分区(AMP)使得可变块大小运动估计变得更加复杂，因此需要特定的硬件架构来实时实现。ME过程包括计算两个块，电流块和参考块的绝对差和(SAD)。本文针对HEVC视频编码器的ME提出了一种低复杂度的绝对差和(Sum of Absolute Difference, SAD)架构，该架构能够充分利用和优化不同层次的并行性。在FPGA上实现了该架构，并与其他非并行SAD架构进行了比较。综合结果表明，与非并行架构和其他贡献的结果相比，该架构在FPGA中占用的资源更少。

引用次数: 37

Crosstalk avoidance coding for reliable data transmission of network on chips 芯片上网络可靠数据传输的串扰避免编码

2013 International Symposium on System on Chip (SoC)

Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675266

Z. Shirmohammadi, S. Miremadi

Inter-wire coupling capacitance may lead to crosstalk faults that significantly limits the reliability of NoCs. In this paper, we propose a numerical-based crosstalk avoidance code that can omit the Triplet Opposite Direction (TOD) transitions produced by crosstalk faults. The proposed coding does not have ambiguity and uses all of the codeword space. Simulations using VHDL for different channel widths show that the proposed method can reduce crosstalk fault in the NoC links with negligible power and area overheads.

线间耦合电容可能导致串扰故障，极大地限制了noc的可靠性。在本文中，我们提出了一种基于数字的串扰避免码，可以忽略由串扰故障产生的三态反方向(TOD)转换。所提出的编码没有歧义，并且使用了所有的码字空间。利用VHDL对不同信道宽度的仿真结果表明，该方法可以在不需要功率和面积开销的情况下减少NoC链路中的串扰故障。

引用次数: 16

On the impact of dynamic data management for distributed local memories in heterogeneous MPSoCs 动态数据管理对异构mpsoc中分布式本地存储器的影响

2013 International Symposium on System on Chip (SoC)

Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675267

Benedikt Noethen, Oliver Arnold, G. Fettweis

With the increasing number of integrated functional units in Multi-Processor System-on-Chip (MPSoC) the communication among modules is becoming a major challenge. The overall system performance is not only characterized by the computing power but more often limited by the slow interconnections and memory accesses. Furthermore, the available on-chip memory is not efficiently utilized according to the application requirements. This paper describes the concept, implementation and analysis of a data management unit, which improves the utilization of on-chip distributed local memories. Hence data locality and system performance is improved. This strategy leads to a dramatic reduction in the number of accesses to the external memory for data-intensive applications. The impact of this approach on system performance is investigated in this work, which shows a reduction of 53% in the number of accesses to the external memory. In data-limited environments this leads to an improvement of up to 30% in the overall system performance.

随着多处理器片上系统(MPSoC)中集成功能单元的不断增加，模块之间的通信成为一个重大挑战。系统的整体性能不仅表现在计算能力上，更经常受到缓慢的互连和内存访问的限制。此外，可用的片上存储器没有根据应用需求得到有效利用。本文介绍了一种数据管理单元的概念、实现和分析，提高了片上分布式本地存储器的利用率。从而提高了数据的局部性和系统性能。这种策略大大减少了数据密集型应用程序对外部内存的访问次数。本文研究了这种方法对系统性能的影响，结果表明对外部存储器的访问次数减少了53%。在数据有限的环境中，这将导致整体系统性能提高多达30%。

引用次数: 0

Efficient distributed memory management in a multi-core H.264 decoder on FPGA 基于FPGA的多核H.264解码器的高效分布式内存管理

2013 International Symposium on System on Chip (SoC)

Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675256

Jiajie Zhang, Zheng Yu, Zhiyi Yu, Kexin Zhang, Zhonghai Lu, A. Jantsch

Memory management is a challenging issue of multi-core architecture. With growing core numbers, Distributed Shared Memory (DSM) is becoming a general trend. In this paper, a DSM based multi-core architecture is explored and evaluated via an H.264 decoder application. The memory access and communication over Network-on-Chips is managed by the Data Management Engine (DME). Experimental results realized on an Altera Stratix VI show that 9-node distributed memory system increases performance by 1.5x compared to centralized memory. Moreover, the performance of proposed DSM architecture grows linearly with the number of cores deployed.

内存管理是多核体系结构中一个具有挑战性的问题。随着核心数量的不断增长，分布式共享内存(DSM)正成为一种普遍趋势。本文通过一个H.264解码器应用，对基于DSM的多核架构进行了探索和评估。片上网络的存储器访问和通信由数据管理引擎(Data Management Engine, DME)管理。在Altera Stratix VI上实现的实验结果表明，9节点分布式存储器系统的性能比集中式存储器提高了1.5倍。此外，所提出的DSM架构的性能随部署的核数呈线性增长。

引用次数: 2

Prefetching across a shared memory tree within a Network-on-Chip architecture 在片上网络体系结构中跨共享内存树进行预取

2013 International Symposium on System on Chip (SoC)

Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675268

Jamie Garside, N. Audsley

Within Network-on-Chip architectures the sharing of external memory by many CPUs provides a key challenge within the design in order that memory latencies do not dominate overall performance. Within this paper, we propose and evaluate a stream based prefetch unit within a NoC architecture that utilises a separate shared memory tree to provide access to external memory from each CPU tile. The paper shows that prefetching is an appropriate architectural technique within NoCs, enabling better system performance.

在片上网络(Network-on-Chip)架构中，许多cpu共享外部内存是设计中的一个关键挑战，目的是不让内存延迟主导整体性能。在本文中，我们提出并评估了NoC架构中基于流的预取单元，该单元利用单独的共享内存树来提供对每个CPU块的外部内存的访问。本文表明，预取是一种适用于noc的架构技术，可以提高系统性能。

引用次数: 19

Comparison of analog transactions using statistics 使用统计比较模拟事务

2013 International Symposium on System on Chip (SoC)

Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675282

Alexander W. Rath, Volkan Esen, W. Ecker

The Universal Verification Methodology (UVM) has become a de facto standard in today's functional verification of digital designs. However, it is rarely used for the verification of Designs Under Test containing Real Number Models. This paper presents a new technique using UVM that can be used in order to compare models of analog circuitry on different levels of abstraction. It makes use of statistic metrics. The presented technique enables us to ensure that Real Number Models used in chip projects match the transistor level circuitry during the whole life cycle of the project.

通用验证方法(UVM)已成为当今数字设计功能验证的事实上的标准。然而，它很少用于包含实数模型的被测设计的验证。本文提出了一种使用UVM的新技术，该技术可用于比较不同抽象层次的模拟电路模型。它利用了统计指标。所提出的技术使我们能够确保芯片项目中使用的实数模型在项目的整个生命周期内与晶体管级电路相匹配。

引用次数: 2

Scheduling of parallelized synchronous dataflow actors 并行同步数据流参与者的调度

2013 International Symposium on System on Chip (SoC)

Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675271

Zheng Zhou, K. Desnos, M. Pelcat, J. Nezan, W. Plishker, S. Bhattacharyya

Parallelization of Digital Signal Processing (DSP) software is an important trend for MultiProcessor System-on-Chip (MPSoC) implementation. The performance of DSP systems composed of parallelized computations depends on the scheduling technique, which must in general allocate computation and communication resources for competing tasks, and ensure that data dependencies are satisfied. In this paper, we formulate a new type of parallel task scheduling problem called Parallel Actor Scheduling (PAS) for MPSoC mapping of DSP systems that are represented as Synchronous DataFlow (SDF) graphs. In contrast to traditional SDF-based scheduling techniques, which focus on exploiting graph level (inter-actor) parallelism, the PAS problem targets the integrated exploitation of both intra- and inter-actor parallelism for platforms in which individual actors can be parallelized across multiple processing units. We address a special case of the PAS problem in which all of the actors in the DSP application or subsystem being optimized can be parallelized. For this special case, we develop and experimentally evaluate a two-phase scheduling framework with two work flows - particle swarm optimization with a mixed integer programming formulation, and particle swarm optimization with a fast heuristic based on list scheduling. We demonstrate that our PAS-targeted scheduling framework provides a useful range of trade-offs between synthesis time requirements and the quality of the derived solutions.

数字信号处理(DSP)软件的并行化是多处理器片上系统(MPSoC)实现的一个重要趋势。并行计算组成的DSP系统的性能取决于调度技术，调度技术必须为竞争任务分配计算和通信资源，并保证数据依赖性得到满足。本文提出了一种新的并行任务调度问题，称为并行Actor调度(PAS)，用于DSP系统的MPSoC映射，该映射以同步数据流(SDF)图表示。传统的基于sdf的调度技术侧重于利用图级(参与者之间)的并行性，而PAS问题的目标是综合利用参与者内部和参与者之间的并行性，在这些平台中，单个参与者可以跨多个处理单元并行化。我们解决了PAS问题的一个特殊情况，其中DSP应用程序或子系统中的所有参与者都可以并行化。针对这种特殊情况，我们开发并实验评估了一个具有两个工作流的两阶段调度框架-混合整数规划公式的粒子群优化和基于列表调度的快速启发式粒子群优化。我们证明了以pas为目标的调度框架在合成时间需求和派生解决方案的质量之间提供了一个有用的权衡范围。

{"title":"Scheduling of parallelized synchronous dataflow actors","authors":"Zheng Zhou, K. Desnos, M. Pelcat, J. Nezan, W. Plishker, S. Bhattacharyya","doi":"10.1109/ISSoC.2013.6675271","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675271","url":null,"abstract":"Parallelization of Digital Signal Processing (DSP) software is an important trend for MultiProcessor System-on-Chip (MPSoC) implementation. The performance of DSP systems composed of parallelized computations depends on the scheduling technique, which must in general allocate computation and communication resources for competing tasks, and ensure that data dependencies are satisfied. In this paper, we formulate a new type of parallel task scheduling problem called Parallel Actor Scheduling (PAS) for MPSoC mapping of DSP systems that are represented as Synchronous DataFlow (SDF) graphs. In contrast to traditional SDF-based scheduling techniques, which focus on exploiting graph level (inter-actor) parallelism, the PAS problem targets the integrated exploitation of both intra- and inter-actor parallelism for platforms in which individual actors can be parallelized across multiple processing units. We address a special case of the PAS problem in which all of the actors in the DSP application or subsystem being optimized can be parallelized. For this special case, we develop and experimentally evaluate a two-phase scheduling framework with two work flows - particle swarm optimization with a mixed integer programming formulation, and particle swarm optimization with a fast heuristic based on list scheduling. We demonstrate that our PAS-targeted scheduling framework provides a useful range of trade-offs between synthesis time requirements and the quality of the derived solutions.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126217715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

FPGA-accelerated color edge detection using a Geometric-Algebra-to-Verilog compiler 使用几何代数到verilog编译器的fpga加速颜色边缘检测

2013 International Symposium on System on Chip (SoC)

Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675272

Florian Stock, A. Koch, D. Hildenbrand

Geometric Algebra (GA) is a branch of mathematics that generalizes complex numbers and quaternions. One of the advantages of the framework is, that it allows intuitive description and manipulation of geometric objects. While even complex operations can be described concisely, the actual evaluation of these GA expressions is extremely compute intensive. However, it has significant fine-grained parallelism, which makes it a profitable target for hardware implementation. In this paper, we present the automatic acceleration of a color edge-detection algorithm from a GA description. Using our Gaalop GA compiler with its Verilog back-end, we can show speed-ups of over 1000x even compared to a recent GA processor ASIC.

几何代数(GA)是数学的一个分支，它概括了复数和四元数。该框架的优点之一是，它允许对几何对象进行直观的描述和操作。虽然即使是复杂的操作也可以简洁地描述，但这些GA表达式的实际求值是非常密集的计算。然而，它具有显著的细粒度并行性，这使得它成为硬件实现的有利目标。本文提出了一种基于遗传算法描述的彩色边缘自动加速检测算法。使用我们的Gaalop GA编译器及其Verilog后端，我们可以显示出超过1000倍的速度，甚至与最近的GA处理器ASIC相比。

引用次数: 12

System interconnect extensions for fully transparent demand paging in low-cost MMU-less embedded systems 系统互连扩展，在低成本的MMU-less嵌入式系统中实现完全透明的需求分页

2013 International Symposium on System on Chip (SoC)

Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675257

Lorenzo Zuolo, Gabriele Miorandi, C. Zambelli, P. Olivo, D. Bertozzi

MMU-less embedded systems are the state of the art solution for deeply embedded computing environments. Thanks to the rapid evolution of such devices, nowadays applications that run on top of them are evolving from simple control tasks to more complex applications that involve an Operating System (OS). At the same time, cost budget remains unchanged in spite of the growing performance requirements. For this reason, traditional code loading and execution techniques like full code shadowing or execute-in-place may lead to a performance bottleneck. Even demand paging strategies lack consensus due to the customization and the complexity of the software infrastructure dealing with the memory management. The objective of this work is to implement a transparent hardware-based demand paging strategy for code loading and execution, targeting MMU-less embedded systems. This approach consists of making the system interconnect aware of the memory map, without burdening on the legacy OS code, application code and on the compilation framework. This approach materializes lower boot-up latency and shorter application execution time with respect to traditional loading and executing schemes.

无mmu嵌入式系统是深度嵌入式计算环境的最新解决方案。由于这些设备的快速发展，如今在这些设备上运行的应用程序正在从简单的控制任务演变为涉及操作系统(OS)的更复杂的应用程序。与此同时，尽管性能要求不断提高，但成本预算保持不变。由于这个原因，传统的代码加载和执行技术(如完整代码跟踪或就地执行)可能会导致性能瓶颈。由于定制和处理内存管理的软件基础设施的复杂性，甚至需求分页策略也缺乏一致性。这项工作的目标是为代码加载和执行实现一个透明的基于硬件的请求分页策略，目标是无mmu的嵌入式系统。这种方法包括使系统互连能够感知内存映射，而不会给遗留操作系统代码、应用程序代码和编译框架带来负担。与传统的加载和执行方案相比，这种方法实现了更低的启动延迟和更短的应用程序执行时间。

{"title":"System interconnect extensions for fully transparent demand paging in low-cost MMU-less embedded systems","authors":"Lorenzo Zuolo, Gabriele Miorandi, C. Zambelli, P. Olivo, D. Bertozzi","doi":"10.1109/ISSoC.2013.6675257","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675257","url":null,"abstract":"MMU-less embedded systems are the state of the art solution for deeply embedded computing environments. Thanks to the rapid evolution of such devices, nowadays applications that run on top of them are evolving from simple control tasks to more complex applications that involve an Operating System (OS). At the same time, cost budget remains unchanged in spite of the growing performance requirements. For this reason, traditional code loading and execution techniques like full code shadowing or execute-in-place may lead to a performance bottleneck. Even demand paging strategies lack consensus due to the customization and the complexity of the software infrastructure dealing with the memory management. The objective of this work is to implement a transparent hardware-based demand paging strategy for code loading and execution, targeting MMU-less embedded systems. This approach consists of making the system interconnect aware of the memory map, without burdening on the legacy OS code, application code and on the compilation framework. This approach materializes lower boot-up latency and shorter application execution time with respect to traditional loading and executing schemes.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121940807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 International Symposium on System on Chip (SoC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀