首页 > 最新文献

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
An adaptive transmitting power technique for energy efficient mm-wave wireless NoCs 高效毫米波无线noc的自适应发射功率技术
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.284
Andrea Mineo, M. Palesi, G. Ascia, V. Catania
Several emerging techniques have been recently proposed for alleviating the communication latency and the energy consumption issues in multi/many-core architectures. One of such emerging communication techniques, namely, WiNoC replaces the traditional wired links with the use of wireless medium. Unfortunately, the energy consumed by the RF transceiver (i.e., the main building block of a WiNoC), and in particular by its transmitter, accounts for a significant fraction of the overall communication energy. In this paper we propose a runtime tunable transmitting power technique for improving the energy efficiency of the transceiver in wireless NoC architectures. The basic idea is tuning the transmitting power based on the location of the recipient of the current communication. The integration of the proposed technique into two known WiNoC architectures, namely, iWise64 and McWiNoC resulted in an energy reduction of 43% and 60%, respectively.
近年来,人们提出了一些新兴技术来缓解多核/多核架构中的通信延迟和能耗问题。其中一种新兴的通信技术,即无线通信(WiNoC),用无线媒介取代了传统的有线链路。不幸的是,射频收发器(即WiNoC的主要构建块)消耗的能量,特别是其发射器消耗的能量占总体通信能量的很大一部分。本文提出了一种运行时可调发射功率技术,以提高无线NoC架构中收发器的能量效率。其基本思想是根据当前通信接收者的位置调整发射功率。将提出的技术集成到两个已知的WiNoC架构中,即iWise64和McWiNoC,分别减少了43%和60%的能量。
{"title":"An adaptive transmitting power technique for energy efficient mm-wave wireless NoCs","authors":"Andrea Mineo, M. Palesi, G. Ascia, V. Catania","doi":"10.7873/DATE.2014.284","DOIUrl":"https://doi.org/10.7873/DATE.2014.284","url":null,"abstract":"Several emerging techniques have been recently proposed for alleviating the communication latency and the energy consumption issues in multi/many-core architectures. One of such emerging communication techniques, namely, WiNoC replaces the traditional wired links with the use of wireless medium. Unfortunately, the energy consumed by the RF transceiver (i.e., the main building block of a WiNoC), and in particular by its transmitter, accounts for a significant fraction of the overall communication energy. In this paper we propose a runtime tunable transmitting power technique for improving the energy efficiency of the transceiver in wireless NoC architectures. The basic idea is tuning the transmitting power based on the location of the recipient of the current communication. The integration of the proposed technique into two known WiNoC architectures, namely, iWise64 and McWiNoC resulted in an energy reduction of 43% and 60%, respectively.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73934916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Bus designs for time-probabilistic multicore processors 时间概率多核处理器的总线设计
Pub Date : 2014-03-24 DOI: 10.7873/DATE2014.063
J. Jalle, Leonidas Kosmidis, J. Abella, E. Quiñones, F. Cazorla
Probabilistic Timing Analysis (PTA) reduces the amount of information needed to provide tight WCET estimates in real-time systems with respect to classic timing analysis. PTA imposes new requirements on hardware design that have been shown implementable for single-core architectures. However, no support has been proposed for multicores so far. In this paper, we propose several probabilistically-analysable bus designs for multicore processors ranging from 4 cores connected with a single bus, to 16 cores deploying a hierarchical bus design. We derive analytical models of the probabilistic timing behaviour for the different bus designs, show their suitability for PTA and evaluate their hardware cost. Our results show that the proposed bus designs (i) fulfil PTA requirements, (ii) allow deriving WCET estimates with the same cost and complexity as in single-core processors, and (iii) provide higher guaranteed performance than single-core processors, 3.4x and 6.6x on average for an 8-core and a 16-core setup respectively.
相对于经典的时序分析,概率时序分析(PTA)减少了在实时系统中提供严格的WCET估计所需的信息量。PTA对硬件设计提出了新的要求,这些要求已被证明可以在单核架构中实现。然而,到目前为止还没有提议支持多核。在本文中,我们提出了几种概率可分析的多核处理器总线设计,从4核连接一个总线到16核部署分层总线设计。我们推导了不同总线设计的概率时序行为的分析模型,展示了它们对PTA的适用性,并评估了它们的硬件成本。我们的研究结果表明,提出的总线设计(i)满足PTA要求,(ii)允许以与单核处理器相同的成本和复杂性推导WCET估计,以及(iii)提供比单核处理器更高的保证性能,8核和16核设置的平均性能分别为3.4倍和6.6倍。
{"title":"Bus designs for time-probabilistic multicore processors","authors":"J. Jalle, Leonidas Kosmidis, J. Abella, E. Quiñones, F. Cazorla","doi":"10.7873/DATE2014.063","DOIUrl":"https://doi.org/10.7873/DATE2014.063","url":null,"abstract":"Probabilistic Timing Analysis (PTA) reduces the amount of information needed to provide tight WCET estimates in real-time systems with respect to classic timing analysis. PTA imposes new requirements on hardware design that have been shown implementable for single-core architectures. However, no support has been proposed for multicores so far. In this paper, we propose several probabilistically-analysable bus designs for multicore processors ranging from 4 cores connected with a single bus, to 16 cores deploying a hierarchical bus design. We derive analytical models of the probabilistic timing behaviour for the different bus designs, show their suitability for PTA and evaluate their hardware cost. Our results show that the proposed bus designs (i) fulfil PTA requirements, (ii) allow deriving WCET estimates with the same cost and complexity as in single-core processors, and (iii) provide higher guaranteed performance than single-core processors, 3.4x and 6.6x on average for an 8-core and a 16-core setup respectively.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75859571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
A thermal resilient integration of many-core microprocessors and main memory by 2.5D TSI I/Os 多核微处理器和主存储器通过2.5D TSI I/ o热弹性集成
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.190
Sih-Sian Wu, Kanwen Wang, Sai Manoj Pudukotai Dinakarrao, Tsung-Yi Ho, Mingbin Yu, Hao Yu
One memory-logic-integration design platform is developed in this paper with thermal reliability analysis provided for 2.5D through-silicon-interposer (TSI) and 3D through-silicon-via (TSV) based integrations. Temperature-dependent delay and power models have been developed at microarchitecture level for 2.5D and 3D integrations of many-core microprocessors and main memory, respectively. Experiments are performed by general-purpose benchmarks from SPEC CPU2006 and also cloud-oriented benchmarks from Phoenix with the following observations. The memory-logic integration by 3D RC-interconnected TSV I/Os can result in thermal runaway failures due to strong electrical-thermal couplings. On the other hand, the one by 2.5D transmission-line-interconnected TSI I/Os has shown almost the same energy efficiency and better thermal resilience.
本文开发了一个存储逻辑集成设计平台,并为基于2.5D通硅介孔(TSI)和基于3D通硅通孔(TSV)的集成提供了热可靠性分析。温度相关的延迟和功耗模型已经分别在多核微处理器和主存的2.5D和3D集成的微架构级别上开发出来。实验是通过SPEC CPU2006的通用基准测试和Phoenix的面向云的基准测试进行的,观察结果如下。通过3D rc互连的TSV I/ o进行存储逻辑集成,由于强电-热耦合,可能导致热失控故障。另一方面,由2.5D传输线互连的TSI I/ o显示出几乎相同的能源效率和更好的热弹性。
{"title":"A thermal resilient integration of many-core microprocessors and main memory by 2.5D TSI I/Os","authors":"Sih-Sian Wu, Kanwen Wang, Sai Manoj Pudukotai Dinakarrao, Tsung-Yi Ho, Mingbin Yu, Hao Yu","doi":"10.7873/DATE.2014.190","DOIUrl":"https://doi.org/10.7873/DATE.2014.190","url":null,"abstract":"One memory-logic-integration design platform is developed in this paper with thermal reliability analysis provided for 2.5D through-silicon-interposer (TSI) and 3D through-silicon-via (TSV) based integrations. Temperature-dependent delay and power models have been developed at microarchitecture level for 2.5D and 3D integrations of many-core microprocessors and main memory, respectively. Experiments are performed by general-purpose benchmarks from SPEC CPU2006 and also cloud-oriented benchmarks from Phoenix with the following observations. The memory-logic integration by 3D RC-interconnected TSV I/Os can result in thermal runaway failures due to strong electrical-thermal couplings. On the other hand, the one by 2.5D transmission-line-interconnected TSI I/Os has shown almost the same energy efficiency and better thermal resilience.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74725003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A logic integrated optimal pin-count design for digital microfluidic biochips 数字微流控生物芯片的逻辑集成最佳引脚数设计
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.088
Trung Anh Dinh, S. Yamashita, Tsung-Yi Ho
Digital microfluidic biochips have become one of the most promising technologies for biomedical experiments. In modern microfluidic technology, reducing the number of independent control pins that reflects most of the fabrication cost, power consumption and reliability of a microfluidic system, is a key challenge for every digital microfluidic biochip design. However, all the previous chip designs sacrifice the optimality of the problem, and only limited reduction on the number of control pins is observed. Moreover, most existing designs cannot satisfy high-throughput demand for bioassays, and thus inapplicable in practical contexts. In this paper, we propose the first optimal pin-count design scheme for digital microfluidic biochips. By integrating a very simple combinational logic circuit into the original chip, the proposed scheme can provide high-throughput for bioassays with an information-theoretic minimum number of control pins. Furthermore, to cope with the rapid growth of the chip's scale, we also propose a scalable and efficient heuristics. Experiments demonstrate that the proposed scheme can obtain much fewer number of control pins compared with the previous state-of-the-art works.
数字微流控生物芯片已成为生物医学实验中最具发展前景的技术之一。在现代微流控技术中,减少独立控制引脚的数量反映了微流控系统的大部分制造成本、功耗和可靠性,是每个数字微流控生物芯片设计的关键挑战。然而,所有先前的芯片设计都牺牲了问题的最优性,并且只观察到控制引脚数量的有限减少。此外,大多数现有设计不能满足生物测定的高通量需求,因此在实际环境中不适用。在本文中,我们提出了数字微流控生物芯片的第一个最佳引脚数设计方案。通过将一个非常简单的组合逻辑电路集成到原始芯片中,该方案可以在信息论的最小控制引脚数量下提供高通量的生物分析。此外,为了应对芯片规模的快速增长,我们还提出了一种可扩展的高效启发式算法。实验表明,与现有方法相比,该方法可获得更少的控制引脚数。
{"title":"A logic integrated optimal pin-count design for digital microfluidic biochips","authors":"Trung Anh Dinh, S. Yamashita, Tsung-Yi Ho","doi":"10.7873/DATE.2014.088","DOIUrl":"https://doi.org/10.7873/DATE.2014.088","url":null,"abstract":"Digital microfluidic biochips have become one of the most promising technologies for biomedical experiments. In modern microfluidic technology, reducing the number of independent control pins that reflects most of the fabrication cost, power consumption and reliability of a microfluidic system, is a key challenge for every digital microfluidic biochip design. However, all the previous chip designs sacrifice the optimality of the problem, and only limited reduction on the number of control pins is observed. Moreover, most existing designs cannot satisfy high-throughput demand for bioassays, and thus inapplicable in practical contexts. In this paper, we propose the first optimal pin-count design scheme for digital microfluidic biochips. By integrating a very simple combinational logic circuit into the original chip, the proposed scheme can provide high-throughput for bioassays with an information-theoretic minimum number of control pins. Furthermore, to cope with the rapid growth of the chip's scale, we also propose a scalable and efficient heuristics. Experiments demonstrate that the proposed scheme can obtain much fewer number of control pins compared with the previous state-of-the-art works.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74753594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Partitioned mixed-criticality scheduling on multiprocessor platforms 多处理器平台上的分区混合临界调度
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.305
Chuancai Gu, Nan Guan, Qingxu Deng, W. Yi
Scheduling mixed-criticality systems that integrate multiple functionalities with different criticality levels into a shared platform appears to be a challenging problem, even on single-processor platforms. Multi-core processors are more and more widely used in embedded systems, which provide great computing capacities for such mixed-criticality systems. In this paper, we propose a partitioned scheduling algorithm MPVD to extend the state-of-the-art single-processor mixed-criticality scheduling algorithm EY to multiprocessor platforms. The key idea of MPVD is to evenly allocate tasks with different criticality levels to different processors, in order to better explore the asymmetry between different criticality levels and improve the system schedulability. Then we propose two enhancements to further improve the schedulability of MPVD. Experiments with randomly generated task sets show significant performance improvement of our proposed approach over existing algorithms.
将具有不同临界级别的多种功能集成到共享平台中的混合临界系统的调度似乎是一个具有挑战性的问题,即使在单处理器平台上也是如此。多核处理器在嵌入式系统中的应用越来越广泛,为这种混合临界系统提供了强大的计算能力。本文提出了一种分区调度算法MPVD,将当前最先进的单处理器混合临界调度算法EY扩展到多处理器平台。MPVD的核心思想是将不同临界级别的任务均匀分配给不同的处理器,以更好地探索不同临界级别之间的不对称性,提高系统的可调度性。为了进一步提高MPVD的可调度性,我们提出了两种改进方法。随机生成任务集的实验表明,我们提出的方法比现有算法有显著的性能改进。
{"title":"Partitioned mixed-criticality scheduling on multiprocessor platforms","authors":"Chuancai Gu, Nan Guan, Qingxu Deng, W. Yi","doi":"10.7873/DATE.2014.305","DOIUrl":"https://doi.org/10.7873/DATE.2014.305","url":null,"abstract":"Scheduling mixed-criticality systems that integrate multiple functionalities with different criticality levels into a shared platform appears to be a challenging problem, even on single-processor platforms. Multi-core processors are more and more widely used in embedded systems, which provide great computing capacities for such mixed-criticality systems. In this paper, we propose a partitioned scheduling algorithm MPVD to extend the state-of-the-art single-processor mixed-criticality scheduling algorithm EY to multiprocessor platforms. The key idea of MPVD is to evenly allocate tasks with different criticality levels to different processors, in order to better explore the asymmetry between different criticality levels and improve the system schedulability. Then we propose two enhancements to further improve the schedulability of MPVD. Experiments with randomly generated task sets show significant performance improvement of our proposed approach over existing algorithms.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73254836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Rewiring for threshold logic circuit minimization 重新布线阈值逻辑电路最小化
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.134
Chia-Chun Lin, Chun-Yao Wang, Yung-Chih Chen, Ching-Yi Huang
Recently, many works have been focused on synthesis, verification, and testing of threshold circuits due to the rapid development in efficient implementation of threshold logic circuits. To minimize the hardware cost of threshold circuit implementation, this paper proposes a heuristic that consists of rewiring operations and a simplification procedure. Additionally, a subset of input vectors of a gate, called critical-effect vectors, are proved to be complete for formally verifying the equivalence of two threshold logic gates, instead of the whole truth table in this paper. This achievement can accelerate the equivalence checking of two threshold logic gates. The experimental results show that the proposed heuristic can efficiently reduce the cost.
近年来,由于阈值逻辑电路在高效实现方面的迅速发展,许多工作都集中在阈值电路的合成、验证和测试上。为了最小化阈值电路实现的硬件成本,本文提出了一种由重新布线操作和简化过程组成的启发式方法。此外,为了形式化地验证两个阈值逻辑门的等价性,本文证明了门的输入向量子集(称为临界效应向量)是完备的,而不是整个真值表。这一成果可以加快两个阈值逻辑门的等价性检验。实验结果表明,所提出的启发式算法能够有效地降低成本。
{"title":"Rewiring for threshold logic circuit minimization","authors":"Chia-Chun Lin, Chun-Yao Wang, Yung-Chih Chen, Ching-Yi Huang","doi":"10.7873/DATE.2014.134","DOIUrl":"https://doi.org/10.7873/DATE.2014.134","url":null,"abstract":"Recently, many works have been focused on synthesis, verification, and testing of threshold circuits due to the rapid development in efficient implementation of threshold logic circuits. To minimize the hardware cost of threshold circuit implementation, this paper proposes a heuristic that consists of rewiring operations and a simplification procedure. Additionally, a subset of input vectors of a gate, called critical-effect vectors, are proved to be complete for formally verifying the equivalence of two threshold logic gates, instead of the whole truth table in this paper. This achievement can accelerate the equivalence checking of two threshold logic gates. The experimental results show that the proposed heuristic can efficiently reduce the cost.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75293774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Thermal management of batteries using a hybrid supercapacitor architecture 使用混合超级电容器架构的电池热管理
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.344
Donghwa Shin, M. Poncino, E. Macii
Thermal analysis and management of batteries have been an important research issue for battery-operated systems such as electric vehicles and mobile devices. Nowadays, battery packs are designed considering heat dissipation, and external cooling devices such as a cooling fan are also widely used to enforce the reliability and extend the lifetime of a battery. This type of approaches that target the enhancement of the cooling efficiency via the reduction of the thermal resistance cannot achieve an immediate temperature drop to avoid a thermal emergency situation. Approaches based on removing the heat from the heat sources via idle period insertion (similar to what is done for silicon devices) would allow faster thermal response; however it is not obvious how to implement these schemes in the context of batteries. In this paper, we propose the use of a simple parallel battery-supercapacitor hybrid architecture with a dual-mode discharging strategy that can provide immediate temperature management, in which the supercapacitor is used as an energy buffer during the idle periods of the battery. Simulation results shows that the proposed method can keep the battery temperature within the safe range without external cooling devices while exploiting the advantage of the battery-supercapacitor parallel connection.
对于电动汽车和移动设备等电池供电系统,电池的热分析和管理一直是一个重要的研究课题。目前,电池组的设计主要考虑散热问题,为了提高电池的可靠性和延长电池的使用寿命,还广泛采用了冷却风扇等外部冷却设备。这种旨在通过降低热阻来提高冷却效率的方法无法立即实现温度下降以避免热紧急情况。基于通过空闲插入(类似于硅器件所做的)从热源中去除热量的方法将允许更快的热响应;然而,如何在电池环境中实施这些方案并不明显。在本文中,我们提出了一种简单的并联电池-超级电容器混合架构,具有双模式放电策略,可以提供即时温度管理,其中超级电容器在电池空闲期间用作能量缓冲器。仿真结果表明,该方法利用了电池与超级电容器并联的优点,在不使用外部冷却装置的情况下,使电池温度保持在安全范围内。
{"title":"Thermal management of batteries using a hybrid supercapacitor architecture","authors":"Donghwa Shin, M. Poncino, E. Macii","doi":"10.7873/DATE.2014.344","DOIUrl":"https://doi.org/10.7873/DATE.2014.344","url":null,"abstract":"Thermal analysis and management of batteries have been an important research issue for battery-operated systems such as electric vehicles and mobile devices. Nowadays, battery packs are designed considering heat dissipation, and external cooling devices such as a cooling fan are also widely used to enforce the reliability and extend the lifetime of a battery. This type of approaches that target the enhancement of the cooling efficiency via the reduction of the thermal resistance cannot achieve an immediate temperature drop to avoid a thermal emergency situation. Approaches based on removing the heat from the heat sources via idle period insertion (similar to what is done for silicon devices) would allow faster thermal response; however it is not obvious how to implement these schemes in the context of batteries. In this paper, we propose the use of a simple parallel battery-supercapacitor hybrid architecture with a dual-mode discharging strategy that can provide immediate temperature management, in which the supercapacitor is used as an energy buffer during the idle periods of the battery. Simulation results shows that the proposed method can keep the battery temperature within the safe range without external cooling devices while exploiting the advantage of the battery-supercapacitor parallel connection.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79479869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Code generation for embedded heterogeneous architectures on android android上嵌入式异构架构的代码生成
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.099
Richard Membarth, Oliver Reiche, Frank Hannig, J. Teich
The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance. As a consequence, Google introduced APIs that allow to write native applications and to exploit multiple cores as well as embedded GPUs for compute-intensive parts. This paper proposes code generation techniques in order to target the Renderscript and Filterscript APIs. Renderscript harnesses multi-core CPUs and unified shader GPUs, while the more restricted Filterscript also supports GPUs with earlier shader models. Our techniques focus on image processing applications and allow to target these APIs and OpenCL from a common description. We further supersede memory transfers by sharing the same memory region among different processing elements on HSA platforms. As reference, we use an embedded platform hosting a multi-core ARM CPU and an ARM Mali GPU. We show that our generated source code is faster than native implementations in OpenCV as well as the pre-implemented script intrinsics provided by Google for acceleration on the embedded GPU.
Android的成功基于其统一的Java编程模型,该模型允许为各种不同的目标平台编写独立于平台的程序。然而,这是以性能为代价的。因此,Google引入了api,允许编写本机应用程序,并利用多核以及用于计算密集型部件的嵌入式gpu。本文提出了针对Renderscript和Filterscript api的代码生成技术。Renderscript利用多核cpu和统一的着色器gpu,而更受限制的Filterscript也支持早期着色器模型的gpu。我们的技术专注于图像处理应用程序,并允许从一个共同的描述中针对这些api和OpenCL。我们进一步通过在HSA平台上的不同处理元素之间共享相同的内存区域来取代内存传输。作为参考,我们使用嵌入式平台承载多核ARM CPU和ARM Mali GPU。我们表明,我们生成的源代码比OpenCV中的本机实现更快,也比Google提供的用于在嵌入式GPU上加速的预实现脚本内在特性更快。
{"title":"Code generation for embedded heterogeneous architectures on android","authors":"Richard Membarth, Oliver Reiche, Frank Hannig, J. Teich","doi":"10.7873/DATE.2014.099","DOIUrl":"https://doi.org/10.7873/DATE.2014.099","url":null,"abstract":"The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance. As a consequence, Google introduced APIs that allow to write native applications and to exploit multiple cores as well as embedded GPUs for compute-intensive parts. This paper proposes code generation techniques in order to target the Renderscript and Filterscript APIs. Renderscript harnesses multi-core CPUs and unified shader GPUs, while the more restricted Filterscript also supports GPUs with earlier shader models. Our techniques focus on image processing applications and allow to target these APIs and OpenCL from a common description. We further supersede memory transfers by sharing the same memory region among different processing elements on HSA platforms. As reference, we use an embedded platform hosting a multi-core ARM CPU and an ARM Mali GPU. We show that our generated source code is faster than native implementations in OpenCV as well as the pre-implemented script intrinsics provided by Google for acceleration on the embedded GPU.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81277680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
An embedded offset and gain instrument for OpAmp IPs 一种用于OpAmp ip的嵌入式失调和增益仪器
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.031
J. Wan, H. Kerkhoff
Analog and mixed-signal IPs are increasingly required to use digital fabrication technologies and are deeply embedded into system-on-chips (SoC). These developments append more requirements and challenges on analog testing methodologies. Traditional analog testing methods suffer from less accessibility and control with regard to these embedded analog circuits in SoCs. As an alternative, an embedded instrument for analog OpAmp IP tests is proposed in this paper. It can provide the exact gain and offset values of OpAmps instead of only pass/fail result. What's more, it is an non-invasive monitor and can work online without isolating the DUT Opamp from its surrounding feedback networks. Nor does it require accurate test stimulations. In addition, the monitor can remove its own offsets without additional complex self-calibration circuits. All self-calibrations are completed in the digital domain after each measurement in real time. Therefore it is also suitable for aging-sensitive applications, in which the monitor may suffer from aging mechanisms and has additional offset drifts as well. The monitor measurement range for offset is from 0.2mV to 70mV, and for gain it is from 0dB to 40dB. The error for offset measurements can be 10% of the measurement value with plus/minus 0.1mV, and -2.5dB for gain measurements.
模拟和混合信号ip越来越需要使用数字制造技术,并深深嵌入到片上系统(SoC)中。这些发展对模拟测试方法提出了更多的要求和挑战。传统的模拟测试方法对soc中的这些嵌入式模拟电路的可访问性和可控性较差。作为替代方案,本文提出了一种用于模拟OpAmp IP测试的嵌入式仪器。它可以提供精确的增益和OpAmps的偏移值,而不仅仅是通过/失败的结果。更重要的是,它是一种非侵入式监视器,可以在线工作,而无需将DUT Opamp与周围的反馈网络隔离。它也不需要精确的测试刺激。此外,监视器可以消除自己的偏移,而无需额外复杂的自校准电路。所有自校准在每次测量后都在数字域实时完成。因此,它也适用于老化敏感的应用,其中显示器可能遭受老化机制,并有额外的偏移漂移。显示器测量偏置范围为0.2mV ~ 70mV,增益范围为0dB ~ 40dB。偏置测量误差为测量值的10%,±0.1mV,增益测量误差为-2.5dB。
{"title":"An embedded offset and gain instrument for OpAmp IPs","authors":"J. Wan, H. Kerkhoff","doi":"10.7873/DATE.2014.031","DOIUrl":"https://doi.org/10.7873/DATE.2014.031","url":null,"abstract":"Analog and mixed-signal IPs are increasingly required to use digital fabrication technologies and are deeply embedded into system-on-chips (SoC). These developments append more requirements and challenges on analog testing methodologies. Traditional analog testing methods suffer from less accessibility and control with regard to these embedded analog circuits in SoCs. As an alternative, an embedded instrument for analog OpAmp IP tests is proposed in this paper. It can provide the exact gain and offset values of OpAmps instead of only pass/fail result. What's more, it is an non-invasive monitor and can work online without isolating the DUT Opamp from its surrounding feedback networks. Nor does it require accurate test stimulations. In addition, the monitor can remove its own offsets without additional complex self-calibration circuits. All self-calibrations are completed in the digital domain after each measurement in real time. Therefore it is also suitable for aging-sensitive applications, in which the monitor may suffer from aging mechanisms and has additional offset drifts as well. The monitor measurement range for offset is from 0.2mV to 70mV, and for gain it is from 0dB to 40dB. The error for offset measurements can be 10% of the measurement value with plus/minus 0.1mV, and -2.5dB for gain measurements.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85093077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Joint communication scheduling and interconnect synthesis for FPGA-based many-core systems 基于fpga的多核系统联合通信调度与互连综合
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.352
A. Cilardo, E. Fusella, L. Gallo, A. Mazzeo
This work proposes an automated methodology for optimizing FPGA-based many-core interconnect architectures. Based on the application communication requirements, the methodology concurrently defines the structure of the interconnect and the communication task scheduling, taking into account possible dependencies between tasks under given area constraints. The resulting architecture improves the level of communication parallelism that can be exploited while keeping area costs low. The paper thoroughly describes the proposed approach and discusses a few case-studies showing the impact of the proposed technique.
这项工作提出了一种自动化的方法来优化基于fpga的多核互连架构。该方法根据应用程序的通信需求,同时定义了互连结构和通信任务调度,并考虑了给定区域约束下任务之间可能存在的依赖关系。由此产生的体系结构提高了通信并行性的水平,可以在保持低面积成本的同时加以利用。本文详细描述了所提出的方法,并讨论了几个案例研究,显示了所提出的技术的影响。
{"title":"Joint communication scheduling and interconnect synthesis for FPGA-based many-core systems","authors":"A. Cilardo, E. Fusella, L. Gallo, A. Mazzeo","doi":"10.7873/DATE.2014.352","DOIUrl":"https://doi.org/10.7873/DATE.2014.352","url":null,"abstract":"This work proposes an automated methodology for optimizing FPGA-based many-core interconnect architectures. Based on the application communication requirements, the methodology concurrently defines the structure of the interconnect and the communication task scheduling, taking into account possible dependencies between tasks under given area constraints. The resulting architecture improves the level of communication parallelism that can be exploited while keeping area costs low. The paper thoroughly describes the proposed approach and discusses a few case-studies showing the impact of the proposed technique.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85533912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1