首页 > 最新文献

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
DeSpErate: Speeding-up design space exploration by using predictive simulation scheduling 绝望:利用预测仿真调度加速设计空间探索
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.231
Giovanni Mariani, G. Palermo, V. Zaccaria, C. Silvano
The design space exploration (DSE) phase is used to tune configurable system parameters and it generally consists of a multiobjective optimization (MOO) problem. It is usually done at pre-design phase and consists of the evaluation of large design spaces where each configuration requires long simulation. Several heuristic techniques have been proposed in the past and the recent trend is reducing the exploration time by using analytic prediction models to approximate the system metrics, effectively pruning sub-optimal configurations from the exploration scope. However, there is still a missing path towards the effective usage of the underlying computing resources used by the DSE process. In this work, we will show that an alternative and almost orthogonal approach - focused on exploiting the available parallelism in terms of computing resources - can be used to better schedule the simulations and to obtain a high speedup with respect to state of the art approaches, without compromising the accuracy of exploration results. Experimental results will be presented by dealing with the DSE problem of a shared memory multi-core system considering a variable number of available parallel resources to support the DSE phase1.
设计空间探索(DSE)阶段用于调整可配置的系统参数,通常由多目标优化(MOO)问题组成。它通常在预设计阶段完成,包括对大型设计空间的评估,其中每个配置都需要长时间的模拟。过去已经提出了几种启发式技术,最近的趋势是通过使用分析预测模型来近似系统指标来减少勘探时间,有效地从勘探范围中剔除次优配置。然而,对于DSE进程所使用的底层计算资源的有效利用,仍然缺少一条路径。在这项工作中,我们将展示一种替代的几乎正交的方法——专注于利用计算资源方面的可用并行性——可以用来更好地调度模拟,并在不影响勘探结果准确性的情况下获得相对于最先进方法的高加速。实验结果将通过考虑可变数量的可用并行资源来支持DSE阶段来处理共享内存多核系统的DSE问题。
{"title":"DeSpErate: Speeding-up design space exploration by using predictive simulation scheduling","authors":"Giovanni Mariani, G. Palermo, V. Zaccaria, C. Silvano","doi":"10.7873/DATE.2014.231","DOIUrl":"https://doi.org/10.7873/DATE.2014.231","url":null,"abstract":"The design space exploration (DSE) phase is used to tune configurable system parameters and it generally consists of a multiobjective optimization (MOO) problem. It is usually done at pre-design phase and consists of the evaluation of large design spaces where each configuration requires long simulation. Several heuristic techniques have been proposed in the past and the recent trend is reducing the exploration time by using analytic prediction models to approximate the system metrics, effectively pruning sub-optimal configurations from the exploration scope. However, there is still a missing path towards the effective usage of the underlying computing resources used by the DSE process. In this work, we will show that an alternative and almost orthogonal approach - focused on exploiting the available parallelism in terms of computing resources - can be used to better schedule the simulations and to obtain a high speedup with respect to state of the art approaches, without compromising the accuracy of exploration results. Experimental results will be presented by dealing with the DSE problem of a shared memory multi-core system considering a variable number of available parallel resources to support the DSE phase1.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"16 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89517126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Time-decoupled parallel SystemC simulation 时间解耦并行系统仿真
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.204
Jan Weinstock, Christoph Schumacher, R. Leupers, G. Ascheid, L. Tosoratto
With increasing system size and complexity, designers of embedded systems face the challenge of efficiently simulating these systems in order to enable target specific software development and design space exploration as early as possible. Today's multicore workstations offer enormous computational power, but traditional simulation engines like the OSCI SystemC kernel only operate on a single thread, thereby leaving a lot of computational potential unused. Most modern embedded system designs include multiple processors. This work proposes SCope, a SystemC kernel that aims at exploiting the inherent parallelism of such systems by simulating the processors on different threads. A lookahead mechanism is employed to reduce the required synchronization between the simulation threads, thereby further increasing simulation speed. The virtual prototype of the European FP7 project EURETILE system simulator is used as demonstrator for the proposed work, showing a speedup of 4.01× on a four core host system compared to sequential simulation.
随着系统规模和复杂性的增加,嵌入式系统的设计者面临着高效模拟这些系统的挑战,以便尽早实现目标特定的软件开发和设计空间探索。今天的多核工作站提供了巨大的计算能力,但是像OSCI SystemC内核这样的传统模拟引擎只在单个线程上运行,因此留下了大量未使用的计算潜力。大多数现代嵌入式系统设计包括多个处理器。这项工作提出了SCope,一个SystemC内核,旨在通过模拟不同线程上的处理器来利用这些系统的固有并行性。采用向前看机制减少仿真线程之间所需的同步,从而进一步提高仿真速度。欧洲FP7项目eutile系统模拟器的虚拟样机被用作所提出工作的演示,与顺序仿真相比,在四核主机系统上显示了4.01 x的加速。
{"title":"Time-decoupled parallel SystemC simulation","authors":"Jan Weinstock, Christoph Schumacher, R. Leupers, G. Ascheid, L. Tosoratto","doi":"10.7873/DATE.2014.204","DOIUrl":"https://doi.org/10.7873/DATE.2014.204","url":null,"abstract":"With increasing system size and complexity, designers of embedded systems face the challenge of efficiently simulating these systems in order to enable target specific software development and design space exploration as early as possible. Today's multicore workstations offer enormous computational power, but traditional simulation engines like the OSCI SystemC kernel only operate on a single thread, thereby leaving a lot of computational potential unused. Most modern embedded system designs include multiple processors. This work proposes SCope, a SystemC kernel that aims at exploiting the inherent parallelism of such systems by simulating the processors on different threads. A lookahead mechanism is employed to reduce the required synchronization between the simulation threads, thereby further increasing simulation speed. The virtual prototype of the European FP7 project EURETILE system simulator is used as demonstrator for the proposed work, showing a speedup of 4.01× on a four core host system compared to sequential simulation.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"38 16","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91438963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters 紧耦合硬件对嵌入式共享内存集群中动态并行加速的支持
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.169
P. Burgio, Giuseppe Tagliavini, Francesco Conti, A. Marongiu, L. Benini
Modern designs for embedded systems are increasingly embracing cluster-based architectures, where small sets of cores communicate through tightly-coupled shared memory banks and high-performance interconnections. At the same time, the complexity of modern applications requires new programming abstractions to exploit dynamic and/or irregular parallelism on such platforms. Supporting dynamic parallelism in systems which i) are resource-constrained and ii) run applications with small units of work calls for a runtime environment which has minimal overhead for the scheduling of parallel tasks. In this work, we study the major sources of overhead in the implementation of OpenMP dynamic loops, sections and tasks, and propose a hardware implementation of a generic Scheduling Engine (HWSE) which fits the semantics of the three constructs. The HWSE is designed as a tightly-coupled block to the PEs within a multi-core cluster, communicating through a shared-memory interface. This allows very fast programming and synchronization with the controlling PEs, fundamental to achieving fast dynamic scheduling, and ultimately to enable fine-grained parallelism. We prove the effectiveness of our solutions with real applications and synthetic benchmarks, using a cycle-accurate virtual platform.
嵌入式系统的现代设计越来越多地采用基于集群的架构,其中小型核心集通过紧密耦合的共享内存库和高性能互连进行通信。同时,现代应用程序的复杂性需要新的编程抽象来利用这些平台上的动态和/或不规则并行性。在资源受限的系统中支持动态并行,以及在运行具有小工作单元的应用程序的系统中支持动态并行,这样的运行时环境对并行任务的调度开销最小。在这项工作中,我们研究了OpenMP动态循环、分段和任务实现中的主要开销来源,并提出了一种符合这三种结构语义的通用调度引擎(HWSE)的硬件实现。HWSE被设计成与多核集群中的pe紧密耦合的块,通过共享内存接口进行通信。这允许与控制pe进行非常快速的编程和同步,这是实现快速动态调度的基础,并最终实现细粒度并行性。我们使用周期精确的虚拟平台,通过实际应用和合成基准证明了我们解决方案的有效性。
{"title":"Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters","authors":"P. Burgio, Giuseppe Tagliavini, Francesco Conti, A. Marongiu, L. Benini","doi":"10.7873/DATE.2014.169","DOIUrl":"https://doi.org/10.7873/DATE.2014.169","url":null,"abstract":"Modern designs for embedded systems are increasingly embracing cluster-based architectures, where small sets of cores communicate through tightly-coupled shared memory banks and high-performance interconnections. At the same time, the complexity of modern applications requires new programming abstractions to exploit dynamic and/or irregular parallelism on such platforms. Supporting dynamic parallelism in systems which i) are resource-constrained and ii) run applications with small units of work calls for a runtime environment which has minimal overhead for the scheduling of parallel tasks. In this work, we study the major sources of overhead in the implementation of OpenMP dynamic loops, sections and tasks, and propose a hardware implementation of a generic Scheduling Engine (HWSE) which fits the semantics of the three constructs. The HWSE is designed as a tightly-coupled block to the PEs within a multi-core cluster, communicating through a shared-memory interface. This allows very fast programming and synchronization with the controlling PEs, fundamental to achieving fast dynamic scheduling, and ultimately to enable fine-grained parallelism. We prove the effectiveness of our solutions with real applications and synthetic benchmarks, using a cycle-accurate virtual platform.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"55 11 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82345148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Image progressive acquisition for hardware systems 用于硬件系统的图像渐进采集
Pub Date : 2014-03-24 DOI: 10.5555/2616606.2617107
Jianxiong Liu, C. Bouganis, P. Cheung
As the resolution of digital images increases, accessing raw image data from memory has become a major consideration during the design of image/video processing systems. This is due to the fact that the bandwidth requirement and energy consumption of such image accessing process has increased. Inspired by the successful application of progressive image sampling techniques in many image processing tasks, this work proposes to apply similar concept within hardware systems to efficiently trade image quality for reduced memory bandwidth requirement and lower energy consumption. Based on this idea, a hardware system is proposed that is placed between the memory subsystem and the processing core of the design. The proposed system alters the conventional memory access pattern to progressively and adaptively access pixels from a target memory external to the system. The sampled pixels are used to reconstruct an approximation to the ground truth, which is stored in an internal image buffer for further processing. The system is prototyped on FPGA and its performance evaluation shows that a saving of up to 85% of memory accessing time and 33%/45% of image acquisition time/energy is achieved on the benchmark image “lena” while maintaining a PSNR of about 30 dB.
随着数字图像分辨率的提高,从存储器中获取原始图像数据已成为图像/视频处理系统设计中的一个主要考虑因素。这是由于这种图像访问过程的带宽需求和能耗增加所致。受渐进式图像采样技术在许多图像处理任务中成功应用的启发,本工作提出在硬件系统中应用类似的概念,以有效地交换图像质量,以减少内存带宽要求和降低能耗。在此基础上,提出了一个位于存储子系统和处理核心之间的硬件系统。所提出的系统改变了传统的存储器访问模式,从系统外部的目标存储器中逐步地、自适应地访问像素。采样的像素被用来重建一个近似的地面真相,这是存储在内部图像缓冲进一步处理。该系统在FPGA上进行了原型设计,其性能评估表明,在基准图像“lena”上实现了高达85%的内存访问时间和33%/45%的图像采集时间/能量,同时保持了约30 dB的PSNR。
{"title":"Image progressive acquisition for hardware systems","authors":"Jianxiong Liu, C. Bouganis, P. Cheung","doi":"10.5555/2616606.2617107","DOIUrl":"https://doi.org/10.5555/2616606.2617107","url":null,"abstract":"As the resolution of digital images increases, accessing raw image data from memory has become a major consideration during the design of image/video processing systems. This is due to the fact that the bandwidth requirement and energy consumption of such image accessing process has increased. Inspired by the successful application of progressive image sampling techniques in many image processing tasks, this work proposes to apply similar concept within hardware systems to efficiently trade image quality for reduced memory bandwidth requirement and lower energy consumption. Based on this idea, a hardware system is proposed that is placed between the memory subsystem and the processing core of the design. The proposed system alters the conventional memory access pattern to progressively and adaptively access pixels from a target memory external to the system. The sampled pixels are used to reconstruct an approximation to the ground truth, which is stored in an internal image buffer for further processing. The system is prototyped on FPGA and its performance evaluation shows that a saving of up to 85% of memory accessing time and 33%/45% of image acquisition time/energy is achieved on the benchmark image “lena” while maintaining a PSNR of about 30 dB.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"28 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80895940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Library-based scalable refinement checking for contract-based design 基于库的可扩展细化检查,用于基于契约的设计
Pub Date : 2014-03-24 DOI: 10.7873/DATE2014.167
Antonio Iannopollo, P. Nuzzo, S. Tripakis, A. Sangiovanni-Vincentelli
Given a global specification contract and a system described by a composition of contracts, system verification reduces to checking that the composite contract refines the specification contract, i.e. that any implementation of the composite contract implements the specification contract and is able to operate in any environment admitted by it. Contracts are captured using high-level declarative languages, for example, linear temporal logic (LTL). In this case, refinement checking reduces to an LTL satisfiability checking problem, which can be very expensive to solve for large composite contracts. This paper proposes a scalable refinement checking approach that relies on a library of contracts and local refinement assertions. We propose an algorithm that, given such a library, breaks down the refinement checking problem into multiple successive refinement checks, each of smaller scale. We illustrate the benefits of the approach on an industrial case study of an aircraft electric power system, with up to two orders of magnitude improvement in terms of execution time.
给定一个全局规范契约和一个由契约组合描述的系统,系统验证可以简化为检查组合契约是否细化了规范契约,也就是说,组合契约的任何实现都实现了规范契约,并且能够在它所允许的任何环境中运行。契约是使用高级声明性语言捕获的,例如线性时态逻辑(LTL)。在这种情况下,细化检查减少为LTL可满足性检查问题,对于大型组合契约来说,解决这个问题的成本可能非常高。本文提出了一种可扩展的精化检查方法,该方法依赖于契约库和局部精化断言。我们提出了一种算法,在给定这样一个库的情况下,将精化检查问题分解为多个连续的精化检查,每个检查的规模都较小。我们在飞机电力系统的工业案例研究中说明了该方法的好处,在执行时间方面有多达两个数量级的改进。
{"title":"Library-based scalable refinement checking for contract-based design","authors":"Antonio Iannopollo, P. Nuzzo, S. Tripakis, A. Sangiovanni-Vincentelli","doi":"10.7873/DATE2014.167","DOIUrl":"https://doi.org/10.7873/DATE2014.167","url":null,"abstract":"Given a global specification contract and a system described by a composition of contracts, system verification reduces to checking that the composite contract refines the specification contract, i.e. that any implementation of the composite contract implements the specification contract and is able to operate in any environment admitted by it. Contracts are captured using high-level declarative languages, for example, linear temporal logic (LTL). In this case, refinement checking reduces to an LTL satisfiability checking problem, which can be very expensive to solve for large composite contracts. This paper proposes a scalable refinement checking approach that relies on a library of contracts and local refinement assertions. We propose an algorithm that, given such a library, breaks down the refinement checking problem into multiple successive refinement checks, each of smaller scale. We illustrate the benefits of the approach on an industrial case study of an aircraft electric power system, with up to two orders of magnitude improvement in terms of execution time.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"10 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78323837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
GPU-EvR: Run-time event based real-time scheduling framework on GPGPU platform GPU-EvR: GPGPU平台上基于运行时事件的实时调度框架
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.233
Haeseung Lee, M. A. Faruque
GPU architecture has traditionally been used in graphics application because of its enormous computing capability. Moreover, GPU architecture has also been used for general purpose computing in these days. Most of the current scheduling frameworks that are developed to handle GPGPU workload operate sequentially. This is problematic since this sequential approach may not be scalable for real-time systems, which is a consequence of the approach's inability to support preemption. We propose a novel scheduling framework that provides real-time support for the GPGPU platform. In contrast to existing frameworks, our proposed framework considers both concurrent execution of applications on the GPU and mapping between streaming multiprocessors and thread blocks. By considering both concurrent execution and mapping, our framework is able to guarantee timing up to 6.4 times as many applications compared to TimeGraph [9] and Global EDF [5]. In addition, our experimental applications use up to 20% less power under our scheduling framework compared to [5], [9].
GPU架构由于其巨大的计算能力一直被用于图形应用中。此外,GPU架构也被用于通用计算。目前大多数为处理GPGPU工作负载而开发的调度框架都是顺序运行的。这是有问题的,因为这种顺序方法可能无法对实时系统进行扩展,这是该方法无法支持抢占的结果。我们提出了一种新的调度框架,为GPGPU平台提供实时支持。与现有框架相比,我们提出的框架既考虑了GPU上应用程序的并发执行,也考虑了流多处理器和线程块之间的映射。通过同时考虑并发执行和映射,与TimeGraph[9]和Global EDF[5]相比,我们的框架能够保证多达6.4倍的应用程序计时。此外,与[5],[9]相比,我们的实验应用程序在我们的调度框架下使用的功率减少了20%。
{"title":"GPU-EvR: Run-time event based real-time scheduling framework on GPGPU platform","authors":"Haeseung Lee, M. A. Faruque","doi":"10.7873/DATE.2014.233","DOIUrl":"https://doi.org/10.7873/DATE.2014.233","url":null,"abstract":"GPU architecture has traditionally been used in graphics application because of its enormous computing capability. Moreover, GPU architecture has also been used for general purpose computing in these days. Most of the current scheduling frameworks that are developed to handle GPGPU workload operate sequentially. This is problematic since this sequential approach may not be scalable for real-time systems, which is a consequence of the approach's inability to support preemption. We propose a novel scheduling framework that provides real-time support for the GPGPU platform. In contrast to existing frameworks, our proposed framework considers both concurrent execution of applications on the GPU and mapping between streaming multiprocessors and thread blocks. By considering both concurrent execution and mapping, our framework is able to guarantee timing up to 6.4 times as many applications compared to TimeGraph [9] and Global EDF [5]. In addition, our experimental applications use up to 20% less power under our scheduling framework compared to [5], [9].","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"13 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79239805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A flexible ASIP architecture for connected components labeling in embedded vision applications 嵌入式视觉应用中连接组件标签的灵活的ASIP体系结构
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.367
Juan Fernando Eusse Giraldo, R. Leupers, G. Ascheid, Patrick Sudowe, B. Leibe, Tamon Sadasue
Real-time identification of connected regions of pixels in large (e.g. FullHD) frames is a mandatory and expensive step in many computer vision applications that are becoming increasingly popular in embedded mobile devices such as smart-phones, tablets and head mounted devices. Standard off-the-shelf embedded processors are not yet able to cope with the performance/flexibility trade-offs required by such applications. Therefore, in this work we present an Application Specific Instruction Set Processor (ASIP) tailored to concurrently execute thresholding, connected components labeling and basic feature extraction of image frames. The proposed architecture is capable to cope with frame complexities ranging from QCIF to FullHD frames with 1 to 4 bytes-per-pixel formats, while achieving an average frame rate of 30 frames-per-second (fps). Synthesis was performed for a standard 65nm CMOS library, obtaining an operating frequency of 350MHz and 2.1mm2 area. Moreover, evaluations were conducted both on typical and synthetic data sets, in order to thoroughly assess the achievable performance. Finally, an entire planar-marker based augmented reality application was developed and simulated for the ASIP.
在许多计算机视觉应用中,实时识别大帧(例如全高清)中像素的连接区域是一个强制性和昂贵的步骤,这些应用在嵌入式移动设备(如智能手机、平板电脑和头戴式设备)中越来越流行。标准的现成嵌入式处理器还不能处理此类应用程序所需的性能/灵活性权衡。因此,在这项工作中,我们提出了一个应用特定指令集处理器(ASIP),该处理器可以同时执行阈值分割、连接组件标记和图像帧的基本特征提取。所提出的架构能够处理从QCIF到FullHD帧的帧复杂性,每像素格式为1到4字节,同时实现每秒30帧(fps)的平均帧速率。对标准65nm CMOS文库进行了合成,获得了350MHz的工作频率和2.1mm2的面积。此外,还对典型数据集和合成数据集进行了评估,以便全面评估可实现的性能。最后,针对ASIP开发并仿真了一个完整的基于平面标记的增强现实应用。
{"title":"A flexible ASIP architecture for connected components labeling in embedded vision applications","authors":"Juan Fernando Eusse Giraldo, R. Leupers, G. Ascheid, Patrick Sudowe, B. Leibe, Tamon Sadasue","doi":"10.7873/DATE.2014.367","DOIUrl":"https://doi.org/10.7873/DATE.2014.367","url":null,"abstract":"Real-time identification of connected regions of pixels in large (e.g. FullHD) frames is a mandatory and expensive step in many computer vision applications that are becoming increasingly popular in embedded mobile devices such as smart-phones, tablets and head mounted devices. Standard off-the-shelf embedded processors are not yet able to cope with the performance/flexibility trade-offs required by such applications. Therefore, in this work we present an Application Specific Instruction Set Processor (ASIP) tailored to concurrently execute thresholding, connected components labeling and basic feature extraction of image frames. The proposed architecture is capable to cope with frame complexities ranging from QCIF to FullHD frames with 1 to 4 bytes-per-pixel formats, while achieving an average frame rate of 30 frames-per-second (fps). Synthesis was performed for a standard 65nm CMOS library, obtaining an operating frequency of 350MHz and 2.1mm2 area. Moreover, evaluations were conducted both on typical and synthetic data sets, in order to thoroughly assess the achievable performance. Finally, an entire planar-marker based augmented reality application was developed and simulated for the ASIP.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"9 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79547022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Optimal dimensioning of active cell balancing architectures 有源细胞平衡结构的最优尺寸
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.153
Swaminathan Narayanaswamy, S. Steinhorst, M. Lukasiewycz, M. Kauer, S. Chakraborty
This paper presents an approach to optimal dimensioning of active cell balancing architectures, which are of increasing relevance in Electrical Energy Storages (EESs) for Electric Vehicles (EVs) or stationary applications such as smart grids. Active cell balancing equalizes the state of charge of cells within a battery pack via charge transfers, increasing the effective capacity and lifetime. While optimization approaches have been introduced into the design process of several aspects of EESs, active cell balancing architectures have, until now, not been systematically optimized in terms of their components. Therefore, this paper analyzes existing architectures to develop design metrics for energy dissipation, installation volume, and balancing current. Based on these design metrics, a methodology to efficiently obtain Pareto-optimal configurations for a wide range of inductors and transistors at different balancing currents is developed. Our methodology is then applied to a case study, optimizing two state-of-the-art architectures using realistic balancing algorithms. The results give evidence of the applicability of systematic optimization in the domain of cell balancing, leading to higher energy efficiencies with minimized installation space.
本文提出了一种优化有源电池平衡架构尺寸的方法,这在电动汽车(ev)或固定应用(如智能电网)的电能存储(EESs)中越来越重要。主动电池平衡通过电荷转移平衡电池组内电池的电荷状态,增加有效容量和使用寿命。虽然优化方法已经被引入到EESs的几个方面的设计过程中,但到目前为止,主动单元平衡架构还没有在其组件方面进行系统优化。因此,本文分析了现有的架构,以制定能耗、安装体积和平衡电流的设计指标。基于这些设计指标,本文提出了一种在不同平衡电流下有效获得各种电感和晶体管的帕累托最优配置的方法。然后将我们的方法应用于案例研究,使用现实的平衡算法优化两个最先进的架构。结果证明了系统优化在电池平衡领域的适用性,可以在最小化安装空间的情况下实现更高的能源效率。
{"title":"Optimal dimensioning of active cell balancing architectures","authors":"Swaminathan Narayanaswamy, S. Steinhorst, M. Lukasiewycz, M. Kauer, S. Chakraborty","doi":"10.7873/DATE.2014.153","DOIUrl":"https://doi.org/10.7873/DATE.2014.153","url":null,"abstract":"This paper presents an approach to optimal dimensioning of active cell balancing architectures, which are of increasing relevance in Electrical Energy Storages (EESs) for Electric Vehicles (EVs) or stationary applications such as smart grids. Active cell balancing equalizes the state of charge of cells within a battery pack via charge transfers, increasing the effective capacity and lifetime. While optimization approaches have been introduced into the design process of several aspects of EESs, active cell balancing architectures have, until now, not been systematically optimized in terms of their components. Therefore, this paper analyzes existing architectures to develop design metrics for energy dissipation, installation volume, and balancing current. Based on these design metrics, a methodology to efficiently obtain Pareto-optimal configurations for a wide range of inductors and transistors at different balancing currents is developed. Our methodology is then applied to a case study, optimizing two state-of-the-art architectures using realistic balancing algorithms. The results give evidence of the applicability of systematic optimization in the domain of cell balancing, leading to higher energy efficiencies with minimized installation space.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"150 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77426582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Cross-correlation of specification and RTL for soft IP analysis 软IP分析中规格与RTL的相互关系
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.303
B. Singh, Arunprasath Shankar, F. Wolff, C. Papachristou, D. Weyer, Steve Clay
Semiconductor companies often use 3rd party IPs in order to improve their design productivity. In practice, there are risks involved in using a 3rd party IP as bugs may creep in due to versioning issues, poor documentation, and mismatches between specification and RTL. As a result of this, 3rd party IP specification and RTL must be carefully evaluated. Our methodology addresses this issue, which cross-correlates specification and RTL to discover these discrepancies. The key innovative ideas in our approach are to use prior and trusted experience about designs, which include their specs and RTL code. Also, we have captured this trusted experience into two knowledge bases (KB), Spec-KB and RTL-KB. Finally, knowledge base rules are used to cross-correlate the RTL blocks to the specs. We have tested our approach by analyzing several 3rd party IPs. We have defined metrics for specification coverage and RTL identification coverage to quantify our results.
为了提高设计效率,半导体公司经常使用第三方ip。在实践中,使用第三方IP存在风险,因为版本问题、糟糕的文档以及规范与RTL之间的不匹配可能会导致bug的出现。因此,必须仔细评估第三方IP规范和RTL。我们的方法解决了这个问题,交叉关联规范和RTL来发现这些差异。我们方法中的关键创新思想是使用有关设计的先前和可信赖的经验,包括其规格和RTL代码。此外,我们还将这种可信赖的经验捕获到两个知识库(知识库)中,Spec-KB和RTL-KB。最后,使用知识库规则将RTL块与规范交叉关联。我们通过分析几个第三方ip来测试我们的方法。我们已经定义了规范覆盖和RTL识别覆盖的度量来量化我们的结果。
{"title":"Cross-correlation of specification and RTL for soft IP analysis","authors":"B. Singh, Arunprasath Shankar, F. Wolff, C. Papachristou, D. Weyer, Steve Clay","doi":"10.7873/DATE.2014.303","DOIUrl":"https://doi.org/10.7873/DATE.2014.303","url":null,"abstract":"Semiconductor companies often use 3rd party IPs in order to improve their design productivity. In practice, there are risks involved in using a 3rd party IP as bugs may creep in due to versioning issues, poor documentation, and mismatches between specification and RTL. As a result of this, 3rd party IP specification and RTL must be carefully evaluated. Our methodology addresses this issue, which cross-correlates specification and RTL to discover these discrepancies. The key innovative ideas in our approach are to use prior and trusted experience about designs, which include their specs and RTL code. Also, we have captured this trusted experience into two knowledge bases (KB), Spec-KB and RTL-KB. Finally, knowledge base rules are used to cross-correlate the RTL blocks to the specs. We have tested our approach by analyzing several 3rd party IPs. We have defined metrics for specification coverage and RTL identification coverage to quantify our results.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"43 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79485273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Energy efficient MIMO processing: A case study of opportunistic run-time approximations 节能MIMO处理:机会运行时近似的案例研究
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.220
D. Novo, Nazanin Farahpour, P. Ienne, U. Ahmad, F. Catthoor
Worst-case design is one of the keys to practical engineering: create solutions that can withstand the most adverse possible conditions. Yet, the ever-growing need for higher energy efficiency suggest a grim outlook for worst-case design in the future. In this paper, we propose opportunistic runtime approximations to enable a continuous adaptation of the processing precision (operator type and bitwidth) to the actual execution context without modifying the algorithm functionality. We show that by relaxing the processing precision whenever possible, a VLSI implementation of an advanced wireless receiver algorithm based on opportunistic run-time approximations can save about 40% of the energy consumed by an optimized static implementation. These energy savings are achieved at the expense of a slight increase in overall chip area.
最坏情况设计是实际工程的关键之一:创建能够承受最不利条件的解决方案。然而,对更高能源效率的日益增长的需求表明,未来最坏情况的设计前景黯淡。在本文中,我们提出了机会运行时近似,以便在不修改算法功能的情况下,连续地适应实际执行上下文的处理精度(运算符类型和位宽)。我们表明,通过尽可能放松处理精度,基于机会运行时近似的先进无线接收器算法的VLSI实现可以节省优化静态实现所消耗的能量的40%左右。这些能源的节省是以整体芯片面积的略微增加为代价的。
{"title":"Energy efficient MIMO processing: A case study of opportunistic run-time approximations","authors":"D. Novo, Nazanin Farahpour, P. Ienne, U. Ahmad, F. Catthoor","doi":"10.7873/DATE.2014.220","DOIUrl":"https://doi.org/10.7873/DATE.2014.220","url":null,"abstract":"Worst-case design is one of the keys to practical engineering: create solutions that can withstand the most adverse possible conditions. Yet, the ever-growing need for higher energy efficiency suggest a grim outlook for worst-case design in the future. In this paper, we propose opportunistic runtime approximations to enable a continuous adaptation of the processing precision (operator type and bitwidth) to the actual execution context without modifying the algorithm functionality. We show that by relaxing the processing precision whenever possible, a VLSI implementation of an advanced wireless receiver algorithm based on opportunistic run-time approximations can save about 40% of the energy consumed by an optimized static implementation. These energy savings are achieved at the expense of a slight increase in overall chip area.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"162 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78011960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1