Several emerging techniques have been recently proposed for alleviating the communication latency and the energy consumption issues in multi/many-core architectures. One of such emerging communication techniques, namely, WiNoC replaces the traditional wired links with the use of wireless medium. Unfortunately, the energy consumed by the RF transceiver (i.e., the main building block of a WiNoC), and in particular by its transmitter, accounts for a significant fraction of the overall communication energy. In this paper we propose a runtime tunable transmitting power technique for improving the energy efficiency of the transceiver in wireless NoC architectures. The basic idea is tuning the transmitting power based on the location of the recipient of the current communication. The integration of the proposed technique into two known WiNoC architectures, namely, iWise64 and McWiNoC resulted in an energy reduction of 43% and 60%, respectively.
{"title":"An adaptive transmitting power technique for energy efficient mm-wave wireless NoCs","authors":"Andrea Mineo, M. Palesi, G. Ascia, V. Catania","doi":"10.7873/DATE.2014.284","DOIUrl":"https://doi.org/10.7873/DATE.2014.284","url":null,"abstract":"Several emerging techniques have been recently proposed for alleviating the communication latency and the energy consumption issues in multi/many-core architectures. One of such emerging communication techniques, namely, WiNoC replaces the traditional wired links with the use of wireless medium. Unfortunately, the energy consumed by the RF transceiver (i.e., the main building block of a WiNoC), and in particular by its transmitter, accounts for a significant fraction of the overall communication energy. In this paper we propose a runtime tunable transmitting power technique for improving the energy efficiency of the transceiver in wireless NoC architectures. The basic idea is tuning the transmitting power based on the location of the recipient of the current communication. The integration of the proposed technique into two known WiNoC architectures, namely, iWise64 and McWiNoC resulted in an energy reduction of 43% and 60%, respectively.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"13 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73934916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Jalle, Leonidas Kosmidis, J. Abella, E. Quiñones, F. Cazorla
Probabilistic Timing Analysis (PTA) reduces the amount of information needed to provide tight WCET estimates in real-time systems with respect to classic timing analysis. PTA imposes new requirements on hardware design that have been shown implementable for single-core architectures. However, no support has been proposed for multicores so far. In this paper, we propose several probabilistically-analysable bus designs for multicore processors ranging from 4 cores connected with a single bus, to 16 cores deploying a hierarchical bus design. We derive analytical models of the probabilistic timing behaviour for the different bus designs, show their suitability for PTA and evaluate their hardware cost. Our results show that the proposed bus designs (i) fulfil PTA requirements, (ii) allow deriving WCET estimates with the same cost and complexity as in single-core processors, and (iii) provide higher guaranteed performance than single-core processors, 3.4x and 6.6x on average for an 8-core and a 16-core setup respectively.
{"title":"Bus designs for time-probabilistic multicore processors","authors":"J. Jalle, Leonidas Kosmidis, J. Abella, E. Quiñones, F. Cazorla","doi":"10.7873/DATE2014.063","DOIUrl":"https://doi.org/10.7873/DATE2014.063","url":null,"abstract":"Probabilistic Timing Analysis (PTA) reduces the amount of information needed to provide tight WCET estimates in real-time systems with respect to classic timing analysis. PTA imposes new requirements on hardware design that have been shown implementable for single-core architectures. However, no support has been proposed for multicores so far. In this paper, we propose several probabilistically-analysable bus designs for multicore processors ranging from 4 cores connected with a single bus, to 16 cores deploying a hierarchical bus design. We derive analytical models of the probabilistic timing behaviour for the different bus designs, show their suitability for PTA and evaluate their hardware cost. Our results show that the proposed bus designs (i) fulfil PTA requirements, (ii) allow deriving WCET estimates with the same cost and complexity as in single-core processors, and (iii) provide higher guaranteed performance than single-core processors, 3.4x and 6.6x on average for an 8-core and a 16-core setup respectively.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"70 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75859571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One memory-logic-integration design platform is developed in this paper with thermal reliability analysis provided for 2.5D through-silicon-interposer (TSI) and 3D through-silicon-via (TSV) based integrations. Temperature-dependent delay and power models have been developed at microarchitecture level for 2.5D and 3D integrations of many-core microprocessors and main memory, respectively. Experiments are performed by general-purpose benchmarks from SPEC CPU2006 and also cloud-oriented benchmarks from Phoenix with the following observations. The memory-logic integration by 3D RC-interconnected TSV I/Os can result in thermal runaway failures due to strong electrical-thermal couplings. On the other hand, the one by 2.5D transmission-line-interconnected TSI I/Os has shown almost the same energy efficiency and better thermal resilience.
{"title":"A thermal resilient integration of many-core microprocessors and main memory by 2.5D TSI I/Os","authors":"Sih-Sian Wu, Kanwen Wang, Sai Manoj Pudukotai Dinakarrao, Tsung-Yi Ho, Mingbin Yu, Hao Yu","doi":"10.7873/DATE.2014.190","DOIUrl":"https://doi.org/10.7873/DATE.2014.190","url":null,"abstract":"One memory-logic-integration design platform is developed in this paper with thermal reliability analysis provided for 2.5D through-silicon-interposer (TSI) and 3D through-silicon-via (TSV) based integrations. Temperature-dependent delay and power models have been developed at microarchitecture level for 2.5D and 3D integrations of many-core microprocessors and main memory, respectively. Experiments are performed by general-purpose benchmarks from SPEC CPU2006 and also cloud-oriented benchmarks from Phoenix with the following observations. The memory-logic integration by 3D RC-interconnected TSV I/Os can result in thermal runaway failures due to strong electrical-thermal couplings. On the other hand, the one by 2.5D transmission-line-interconnected TSI I/Os has shown almost the same energy efficiency and better thermal resilience.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"19 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74725003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital microfluidic biochips have become one of the most promising technologies for biomedical experiments. In modern microfluidic technology, reducing the number of independent control pins that reflects most of the fabrication cost, power consumption and reliability of a microfluidic system, is a key challenge for every digital microfluidic biochip design. However, all the previous chip designs sacrifice the optimality of the problem, and only limited reduction on the number of control pins is observed. Moreover, most existing designs cannot satisfy high-throughput demand for bioassays, and thus inapplicable in practical contexts. In this paper, we propose the first optimal pin-count design scheme for digital microfluidic biochips. By integrating a very simple combinational logic circuit into the original chip, the proposed scheme can provide high-throughput for bioassays with an information-theoretic minimum number of control pins. Furthermore, to cope with the rapid growth of the chip's scale, we also propose a scalable and efficient heuristics. Experiments demonstrate that the proposed scheme can obtain much fewer number of control pins compared with the previous state-of-the-art works.
{"title":"A logic integrated optimal pin-count design for digital microfluidic biochips","authors":"Trung Anh Dinh, S. Yamashita, Tsung-Yi Ho","doi":"10.7873/DATE.2014.088","DOIUrl":"https://doi.org/10.7873/DATE.2014.088","url":null,"abstract":"Digital microfluidic biochips have become one of the most promising technologies for biomedical experiments. In modern microfluidic technology, reducing the number of independent control pins that reflects most of the fabrication cost, power consumption and reliability of a microfluidic system, is a key challenge for every digital microfluidic biochip design. However, all the previous chip designs sacrifice the optimality of the problem, and only limited reduction on the number of control pins is observed. Moreover, most existing designs cannot satisfy high-throughput demand for bioassays, and thus inapplicable in practical contexts. In this paper, we propose the first optimal pin-count design scheme for digital microfluidic biochips. By integrating a very simple combinational logic circuit into the original chip, the proposed scheme can provide high-throughput for bioassays with an information-theoretic minimum number of control pins. Furthermore, to cope with the rapid growth of the chip's scale, we also propose a scalable and efficient heuristics. Experiments demonstrate that the proposed scheme can obtain much fewer number of control pins compared with the previous state-of-the-art works.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"13 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74753594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Scheduling mixed-criticality systems that integrate multiple functionalities with different criticality levels into a shared platform appears to be a challenging problem, even on single-processor platforms. Multi-core processors are more and more widely used in embedded systems, which provide great computing capacities for such mixed-criticality systems. In this paper, we propose a partitioned scheduling algorithm MPVD to extend the state-of-the-art single-processor mixed-criticality scheduling algorithm EY to multiprocessor platforms. The key idea of MPVD is to evenly allocate tasks with different criticality levels to different processors, in order to better explore the asymmetry between different criticality levels and improve the system schedulability. Then we propose two enhancements to further improve the schedulability of MPVD. Experiments with randomly generated task sets show significant performance improvement of our proposed approach over existing algorithms.
{"title":"Partitioned mixed-criticality scheduling on multiprocessor platforms","authors":"Chuancai Gu, Nan Guan, Qingxu Deng, W. Yi","doi":"10.7873/DATE.2014.305","DOIUrl":"https://doi.org/10.7873/DATE.2014.305","url":null,"abstract":"Scheduling mixed-criticality systems that integrate multiple functionalities with different criticality levels into a shared platform appears to be a challenging problem, even on single-processor platforms. Multi-core processors are more and more widely used in embedded systems, which provide great computing capacities for such mixed-criticality systems. In this paper, we propose a partitioned scheduling algorithm MPVD to extend the state-of-the-art single-processor mixed-criticality scheduling algorithm EY to multiprocessor platforms. The key idea of MPVD is to evenly allocate tasks with different criticality levels to different processors, in order to better explore the asymmetry between different criticality levels and improve the system schedulability. Then we propose two enhancements to further improve the schedulability of MPVD. Experiments with randomly generated task sets show significant performance improvement of our proposed approach over existing algorithms.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"17 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73254836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, many works have been focused on synthesis, verification, and testing of threshold circuits due to the rapid development in efficient implementation of threshold logic circuits. To minimize the hardware cost of threshold circuit implementation, this paper proposes a heuristic that consists of rewiring operations and a simplification procedure. Additionally, a subset of input vectors of a gate, called critical-effect vectors, are proved to be complete for formally verifying the equivalence of two threshold logic gates, instead of the whole truth table in this paper. This achievement can accelerate the equivalence checking of two threshold logic gates. The experimental results show that the proposed heuristic can efficiently reduce the cost.
{"title":"Rewiring for threshold logic circuit minimization","authors":"Chia-Chun Lin, Chun-Yao Wang, Yung-Chih Chen, Ching-Yi Huang","doi":"10.7873/DATE.2014.134","DOIUrl":"https://doi.org/10.7873/DATE.2014.134","url":null,"abstract":"Recently, many works have been focused on synthesis, verification, and testing of threshold circuits due to the rapid development in efficient implementation of threshold logic circuits. To minimize the hardware cost of threshold circuit implementation, this paper proposes a heuristic that consists of rewiring operations and a simplification procedure. Additionally, a subset of input vectors of a gate, called critical-effect vectors, are proved to be complete for formally verifying the equivalence of two threshold logic gates, instead of the whole truth table in this paper. This achievement can accelerate the equivalence checking of two threshold logic gates. The experimental results show that the proposed heuristic can efficiently reduce the cost.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"10 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75293774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thermal analysis and management of batteries have been an important research issue for battery-operated systems such as electric vehicles and mobile devices. Nowadays, battery packs are designed considering heat dissipation, and external cooling devices such as a cooling fan are also widely used to enforce the reliability and extend the lifetime of a battery. This type of approaches that target the enhancement of the cooling efficiency via the reduction of the thermal resistance cannot achieve an immediate temperature drop to avoid a thermal emergency situation. Approaches based on removing the heat from the heat sources via idle period insertion (similar to what is done for silicon devices) would allow faster thermal response; however it is not obvious how to implement these schemes in the context of batteries. In this paper, we propose the use of a simple parallel battery-supercapacitor hybrid architecture with a dual-mode discharging strategy that can provide immediate temperature management, in which the supercapacitor is used as an energy buffer during the idle periods of the battery. Simulation results shows that the proposed method can keep the battery temperature within the safe range without external cooling devices while exploiting the advantage of the battery-supercapacitor parallel connection.
{"title":"Thermal management of batteries using a hybrid supercapacitor architecture","authors":"Donghwa Shin, M. Poncino, E. Macii","doi":"10.7873/DATE.2014.344","DOIUrl":"https://doi.org/10.7873/DATE.2014.344","url":null,"abstract":"Thermal analysis and management of batteries have been an important research issue for battery-operated systems such as electric vehicles and mobile devices. Nowadays, battery packs are designed considering heat dissipation, and external cooling devices such as a cooling fan are also widely used to enforce the reliability and extend the lifetime of a battery. This type of approaches that target the enhancement of the cooling efficiency via the reduction of the thermal resistance cannot achieve an immediate temperature drop to avoid a thermal emergency situation. Approaches based on removing the heat from the heat sources via idle period insertion (similar to what is done for silicon devices) would allow faster thermal response; however it is not obvious how to implement these schemes in the context of batteries. In this paper, we propose the use of a simple parallel battery-supercapacitor hybrid architecture with a dual-mode discharging strategy that can provide immediate temperature management, in which the supercapacitor is used as an energy buffer during the idle periods of the battery. Simulation results shows that the proposed method can keep the battery temperature within the safe range without external cooling devices while exploiting the advantage of the battery-supercapacitor parallel connection.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"123 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79479869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Membarth, Oliver Reiche, Frank Hannig, J. Teich
The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance. As a consequence, Google introduced APIs that allow to write native applications and to exploit multiple cores as well as embedded GPUs for compute-intensive parts. This paper proposes code generation techniques in order to target the Renderscript and Filterscript APIs. Renderscript harnesses multi-core CPUs and unified shader GPUs, while the more restricted Filterscript also supports GPUs with earlier shader models. Our techniques focus on image processing applications and allow to target these APIs and OpenCL from a common description. We further supersede memory transfers by sharing the same memory region among different processing elements on HSA platforms. As reference, we use an embedded platform hosting a multi-core ARM CPU and an ARM Mali GPU. We show that our generated source code is faster than native implementations in OpenCV as well as the pre-implemented script intrinsics provided by Google for acceleration on the embedded GPU.
Android的成功基于其统一的Java编程模型,该模型允许为各种不同的目标平台编写独立于平台的程序。然而,这是以性能为代价的。因此,Google引入了api,允许编写本机应用程序,并利用多核以及用于计算密集型部件的嵌入式gpu。本文提出了针对Renderscript和Filterscript api的代码生成技术。Renderscript利用多核cpu和统一的着色器gpu,而更受限制的Filterscript也支持早期着色器模型的gpu。我们的技术专注于图像处理应用程序,并允许从一个共同的描述中针对这些api和OpenCL。我们进一步通过在HSA平台上的不同处理元素之间共享相同的内存区域来取代内存传输。作为参考,我们使用嵌入式平台承载多核ARM CPU和ARM Mali GPU。我们表明,我们生成的源代码比OpenCV中的本机实现更快,也比Google提供的用于在嵌入式GPU上加速的预实现脚本内在特性更快。
{"title":"Code generation for embedded heterogeneous architectures on android","authors":"Richard Membarth, Oliver Reiche, Frank Hannig, J. Teich","doi":"10.7873/DATE.2014.099","DOIUrl":"https://doi.org/10.7873/DATE.2014.099","url":null,"abstract":"The success of Android is based on its unified Java programming model that allows to write platform-independent programs for a variety of different target platforms. However, this comes at the cost of performance. As a consequence, Google introduced APIs that allow to write native applications and to exploit multiple cores as well as embedded GPUs for compute-intensive parts. This paper proposes code generation techniques in order to target the Renderscript and Filterscript APIs. Renderscript harnesses multi-core CPUs and unified shader GPUs, while the more restricted Filterscript also supports GPUs with earlier shader models. Our techniques focus on image processing applications and allow to target these APIs and OpenCL from a common description. We further supersede memory transfers by sharing the same memory region among different processing elements on HSA platforms. As reference, we use an embedded platform hosting a multi-core ARM CPU and an ARM Mali GPU. We show that our generated source code is faster than native implementations in OpenCV as well as the pre-implemented script intrinsics provided by Google for acceleration on the embedded GPU.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81277680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analog and mixed-signal IPs are increasingly required to use digital fabrication technologies and are deeply embedded into system-on-chips (SoC). These developments append more requirements and challenges on analog testing methodologies. Traditional analog testing methods suffer from less accessibility and control with regard to these embedded analog circuits in SoCs. As an alternative, an embedded instrument for analog OpAmp IP tests is proposed in this paper. It can provide the exact gain and offset values of OpAmps instead of only pass/fail result. What's more, it is an non-invasive monitor and can work online without isolating the DUT Opamp from its surrounding feedback networks. Nor does it require accurate test stimulations. In addition, the monitor can remove its own offsets without additional complex self-calibration circuits. All self-calibrations are completed in the digital domain after each measurement in real time. Therefore it is also suitable for aging-sensitive applications, in which the monitor may suffer from aging mechanisms and has additional offset drifts as well. The monitor measurement range for offset is from 0.2mV to 70mV, and for gain it is from 0dB to 40dB. The error for offset measurements can be 10% of the measurement value with plus/minus 0.1mV, and -2.5dB for gain measurements.
{"title":"An embedded offset and gain instrument for OpAmp IPs","authors":"J. Wan, H. Kerkhoff","doi":"10.7873/DATE.2014.031","DOIUrl":"https://doi.org/10.7873/DATE.2014.031","url":null,"abstract":"Analog and mixed-signal IPs are increasingly required to use digital fabrication technologies and are deeply embedded into system-on-chips (SoC). These developments append more requirements and challenges on analog testing methodologies. Traditional analog testing methods suffer from less accessibility and control with regard to these embedded analog circuits in SoCs. As an alternative, an embedded instrument for analog OpAmp IP tests is proposed in this paper. It can provide the exact gain and offset values of OpAmps instead of only pass/fail result. What's more, it is an non-invasive monitor and can work online without isolating the DUT Opamp from its surrounding feedback networks. Nor does it require accurate test stimulations. In addition, the monitor can remove its own offsets without additional complex self-calibration circuits. All self-calibrations are completed in the digital domain after each measurement in real time. Therefore it is also suitable for aging-sensitive applications, in which the monitor may suffer from aging mechanisms and has additional offset drifts as well. The monitor measurement range for offset is from 0.2mV to 70mV, and for gain it is from 0dB to 40dB. The error for offset measurements can be 10% of the measurement value with plus/minus 0.1mV, and -2.5dB for gain measurements.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"34 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85093077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This work proposes an automated methodology for optimizing FPGA-based many-core interconnect architectures. Based on the application communication requirements, the methodology concurrently defines the structure of the interconnect and the communication task scheduling, taking into account possible dependencies between tasks under given area constraints. The resulting architecture improves the level of communication parallelism that can be exploited while keeping area costs low. The paper thoroughly describes the proposed approach and discusses a few case-studies showing the impact of the proposed technique.
{"title":"Joint communication scheduling and interconnect synthesis for FPGA-based many-core systems","authors":"A. Cilardo, E. Fusella, L. Gallo, A. Mazzeo","doi":"10.7873/DATE.2014.352","DOIUrl":"https://doi.org/10.7873/DATE.2014.352","url":null,"abstract":"This work proposes an automated methodology for optimizing FPGA-based many-core interconnect architectures. Based on the application communication requirements, the methodology concurrently defines the structure of the interconnect and the communication task scheduling, taking into account possible dependencies between tasks under given area constraints. The resulting architecture improves the level of communication parallelism that can be exploited while keeping area costs low. The paper thoroughly describes the proposed approach and discusses a few case-studies showing the impact of the proposed technique.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"6 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85533912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}