首页 > 最新文献

IPSJ Transactions on System LSI Design Methodology最新文献

英文 中文
Energy-efficient High-level Synthesis for HDR Architectures with Clock Gating Based on Concurrency-oriented Scheduling 基于并发调度的时钟门控HDR架构高能效综合
Q4 Engineering Pub Date : 2013-01-01 DOI: 10.2197/ipsjtsldm.6.101
Hiroyuki Akasaka, Shin-ya Abe, M. Yanagisawa, N. Togawa
With the miniaturization of LSIs and its increasing performance, demand for high-functional portable devices has grown significantly. At the same time, battery lifetime and device overheating are leading to major design problems hampering further LSI integration. On the other hand, the ratio of an interconnection delay to a gate delay has continued to increase as device feature size decreases. We have to estimate interconnection delays and reduce energy consumption even in a high-level synthesis stage. In this paper, we propose a high-level synthesis algorithm for huddle-based distributed-register architectures (HDR architectures) with clock gatings based on concurrency-oriented scheduling/functional unit binding. We assume coarse-grained clock gatings to huddles and we focus on the number of control steps, or gating steps, at which we can apply the clock gating to registers in every huddle. We propose two methods to increase gating steps: One is that we try to schedule and bind operations to be performed at the same timing. By adjusting the clock gating timings in a high-level synthesis stage, we expect that we can enhance the effect of clock gatings more than applying clock gatings after logic synthesis. The other is that we try to synthesize huddles such that each of the synthesized huddles includes registers which have similar or the same clock gating timings. At this time, we determine the clock gating timings to minimize all energy consumption including clock tree energy. The experimental results show that our proposed algorithm reduces energy consumption by a maximum of 23.8% compared with several conventional algorithms.
随着lsi的小型化及其性能的提高,对高功能便携式器件的需求显著增长。与此同时,电池寿命和设备过热正在导致阻碍进一步集成大规模集成电路的主要设计问题。另一方面,互连延迟与栅极延迟的比率随着器件特征尺寸的减小而继续增加。即使在高级合成阶段,我们也必须估计互连延迟并降低能耗。在本文中,我们提出了一种基于并行调度/功能单元绑定的时钟门控的基于簇的分布式寄存器架构(HDR架构)的高级综合算法。我们假设对分组进行粗粒度的时钟门控,并将重点放在控制步骤或门控步骤的数量上,在这些步骤上,我们可以对每个分组中的寄存器应用时钟门控。我们提出了两种方法来增加门控步骤:一种是尝试调度和绑定在同一时间执行的操作。通过在高级合成阶段调整时钟门控时间,我们期望可以比在逻辑合成后应用时钟门控更能增强时钟门控的效果。另一种是我们尝试合成簇,使得每个合成簇包括具有相似或相同时钟门控时间的寄存器。此时,我们确定时钟门控时间以最小化所有能量消耗,包括时钟树能量。实验结果表明,与几种传统算法相比,本文提出的算法最大可降低23.8%的能量消耗。
{"title":"Energy-efficient High-level Synthesis for HDR Architectures with Clock Gating Based on Concurrency-oriented Scheduling","authors":"Hiroyuki Akasaka, Shin-ya Abe, M. Yanagisawa, N. Togawa","doi":"10.2197/ipsjtsldm.6.101","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.6.101","url":null,"abstract":"With the miniaturization of LSIs and its increasing performance, demand for high-functional portable devices has grown significantly. At the same time, battery lifetime and device overheating are leading to major design problems hampering further LSI integration. On the other hand, the ratio of an interconnection delay to a gate delay has continued to increase as device feature size decreases. We have to estimate interconnection delays and reduce energy consumption even in a high-level synthesis stage. In this paper, we propose a high-level synthesis algorithm for huddle-based distributed-register architectures (HDR architectures) with clock gatings based on concurrency-oriented scheduling/functional unit binding. We assume coarse-grained clock gatings to huddles and we focus on the number of control steps, or gating steps, at which we can apply the clock gating to registers in every huddle. We propose two methods to increase gating steps: One is that we try to schedule and bind operations to be performed at the same timing. By adjusting the clock gating timings in a high-level synthesis stage, we expect that we can enhance the effect of clock gatings more than applying clock gatings after logic synthesis. The other is that we try to synthesize huddles such that each of the synthesized huddles includes registers which have similar or the same clock gating timings. At this time, we determine the clock gating timings to minimize all energy consumption including clock tree energy. The experimental results show that our proposed algorithm reduces energy consumption by a maximum of 23.8% compared with several conventional algorithms.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91147408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Method to Reduce Energy Consumption of Conditional Operations with Execution Probabilities 一种降低具有执行概率的条件运算能耗的方法
Q4 Engineering Pub Date : 2013-01-01 DOI: 10.2197/ipsjtsldm.6.60
Kazuhito Ito, Kazuhiko Kameda
In conditional processing, operations are executed conditionally based on the result of condition operations. While the speculative execution of conditional operations achieves higher processing speed, unnecessary energy may be consumed by the speculatively executed operations. In this paper, reduction of the energy consumption of conditional processing is considered for time and resource constrained processing. An efficient method to calculate the probability of operation execution is presented. Based on the probabilities of execution, a scheduling exploration with the simulated annealing and a heuristic scheduling algorithm are proposed to minimize the energy consumption of the conditional processing by reducing unnecessary speculative operations. The experimental results show 5% to 10% energy can be reduced by the proposed methods for the same configuration of resources.
在条件处理中,根据条件操作的结果有条件地执行操作。虽然推测执行条件操作可以获得更高的处理速度,但推测执行的操作可能会消耗不必要的能量。本文针对时间和资源受限的条件处理,考虑降低条件处理的能量消耗。提出了一种计算操作执行概率的有效方法。基于执行概率,提出了一种基于模拟退火的调度探索方法和启发式调度算法,通过减少不必要的推测操作来最小化条件处理的能耗。实验结果表明,在相同的资源配置下,采用该方法可减少5% ~ 10%的能量。
{"title":"A Method to Reduce Energy Consumption of Conditional Operations with Execution Probabilities","authors":"Kazuhito Ito, Kazuhiko Kameda","doi":"10.2197/ipsjtsldm.6.60","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.6.60","url":null,"abstract":"In conditional processing, operations are executed conditionally based on the result of condition operations. While the speculative execution of conditional operations achieves higher processing speed, unnecessary energy may be consumed by the speculatively executed operations. In this paper, reduction of the energy consumption of conditional processing is considered for time and resource constrained processing. An efficient method to calculate the probability of operation execution is presented. Based on the probabilities of execution, a scheduling exploration with the simulated annealing and a heuristic scheduling algorithm are proposed to minimize the energy consumption of the conditional processing by reducing unnecessary speculative operations. The experimental results show 5% to 10% energy can be reduced by the proposed methods for the same configuration of resources.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85353592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Loop Fusion with Outer Loop Shifting for High-level Synthesis 高阶合成中外环移位的环融合
Q4 Engineering Pub Date : 2013-01-01 DOI: 10.2197/ipsjtsldm.6.71
Y. Kato, Kenshu Seto
Loop fusion is often necessary before successful application of high-level synthesis (HLS). Although promising loop optimization tools based on the polyhedral model such as Pluto have been proposed, they sometimes cannot fuse loops into fully nested loops. This paper proposes an effective loop transformation called Outer Loop Shifting (OLS) that facilitates successful loop fusion. With HLS, we found that the OLS generates hardware with 25% less execution cycles on average than that only by Pluto for four benchmark programs.
在高能级合成(HLS)的成功应用之前,回路融合通常是必要的。虽然基于多面体模型的循环优化工具(如Pluto)已经被提出,但它们有时不能将循环融合成完全嵌套的循环。本文提出了一种有效的环路变换,称为外环路移位(OLS),促进了环路的成功融合。使用HLS,我们发现OLS生成的硬件的执行周期比仅使用Pluto生成的四个基准程序平均少25%。
{"title":"Loop Fusion with Outer Loop Shifting for High-level Synthesis","authors":"Y. Kato, Kenshu Seto","doi":"10.2197/ipsjtsldm.6.71","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.6.71","url":null,"abstract":"Loop fusion is often necessary before successful application of high-level synthesis (HLS). Although promising loop optimization tools based on the polyhedral model such as Pluto have been proposed, they sometimes cannot fuse loops into fully nested loops. This paper proposes an effective loop transformation called Outer Loop Shifting (OLS) that facilitates successful loop fusion. With HLS, we found that the OLS generates hardware with 25% less execution cycles on average than that only by Pluto for four benchmark programs.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88860935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An FPGA Implementation of a HOG-based Object Detection Processor 基于hog的目标检测处理器的FPGA实现
Q4 Engineering Pub Date : 2013-01-01 DOI: 10.2197/ipsjtsldm.6.42
Kosuke Mizuno, Yosuke Terachi, Kenta Takagi, S. Izumi, H. Kawaguchi, M. Yoshimoto
This paper describes a Histogram of Oriented Gradients (HOG)-based object detection processor. It features a simplified HOG algorithm with cell-based scanning and simultaneous Support Vector Machine (SVM) calculation, cell-based pipeline architecture, and parallelized modules. To evaluate the effectiveness of our approach, the proposed architecture is implemented onto a FPGA prototyping board. Results show that the proposed architecture can generate HOG features and detect objects with 40 MHz for SVGA resolution video (800 × 600 pixels) at 72 frames per second (fps).
本文描述了一种基于定向梯度直方图(HOG)的目标检测处理器。它具有简化的HOG算法,具有基于细胞的扫描和同步支持向量机(SVM)计算,基于细胞的管道架构和并行模块。为了评估我们方法的有效性,我们在FPGA原型板上实现了所提出的架构。结果表明,该架构能够在72帧/秒(fps)的SVGA分辨率视频(800 × 600像素)中生成HOG特征并检测40 MHz的目标。
{"title":"An FPGA Implementation of a HOG-based Object Detection Processor","authors":"Kosuke Mizuno, Yosuke Terachi, Kenta Takagi, S. Izumi, H. Kawaguchi, M. Yoshimoto","doi":"10.2197/ipsjtsldm.6.42","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.6.42","url":null,"abstract":"This paper describes a Histogram of Oriented Gradients (HOG)-based object detection processor. It features a simplified HOG algorithm with cell-based scanning and simultaneous Support Vector Machine (SVM) calculation, cell-based pipeline architecture, and parallelized modules. To evaluate the effectiveness of our approach, the proposed architecture is implemented onto a FPGA prototyping board. Results show that the proposed architecture can generate HOG features and detect objects with 40 MHz for SVGA resolution video (800 × 600 pixels) at 72 frames per second (fps).","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75285163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
A non-volatile reconfigurable offloader for wireless sensor nodes 一种用于无线传感器节点的非易失性可重构卸载器
Q4 Engineering Pub Date : 2012-12-25 DOI: 10.1145/2460216.2460232
S. Nakaya, M. Miyamura, N. Sakimura, Yuichi Nakamura, T. Sugibayashi
Energy saving is currently one of the most important issues in the development of battery-powered wireless sensor nodes (WSNs). We have developed a non-volatile reconfigurable offioader for flexible and highly efficient processing on WSNs that uses NanoBridges (NBs), which are novel non-volatile and reprogrammable switching elements. Non-volatility is essential for the intermittent operation of WSNs due to the requirement of power-on without loading configuration data. We implemented a data compression algorithm on the offioader that reduces energy consumption during data transmission. Simulation results showed that the energy consumption on the offioader was 1121 of that on an ultra-low power cpu.
节能是当前电池供电无线传感器节点(WSNs)发展的重要问题之一。我们开发了一种非易失性可重新配置的负载器,用于灵活高效地处理使用纳米桥(NBs)的wsn,这是一种新型的非易失性和可重新编程的开关元件。由于需要在不加载配置数据的情况下上电,因此非易失性对于wsn的间歇运行至关重要。我们在服务器上实现了一种数据压缩算法,减少了数据传输过程中的能耗。仿真结果表明,该处理器的能耗是超低功耗cpu的1121倍。
{"title":"A non-volatile reconfigurable offloader for wireless sensor nodes","authors":"S. Nakaya, M. Miyamura, N. Sakimura, Yuichi Nakamura, T. Sugibayashi","doi":"10.1145/2460216.2460232","DOIUrl":"https://doi.org/10.1145/2460216.2460232","url":null,"abstract":"Energy saving is currently one of the most important issues in the development of battery-powered wireless sensor nodes (WSNs). We have developed a non-volatile reconfigurable offioader for flexible and highly efficient processing on WSNs that uses NanoBridges (NBs), which are novel non-volatile and reprogrammable switching elements. Non-volatility is essential for the intermittent operation of WSNs due to the requirement of power-on without loading configuration data. We implemented a data compression algorithm on the offioader that reduces energy consumption during data transmission. Simulation results showed that the energy consumption on the offioader was 1121 of that on an ultra-low power cpu.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89903111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Optimized Communication and Synchronization for Embedded Multiprocessors Using ASIP Methodology 利用ASIP方法优化嵌入式多处理器的通信与同步
Q4 Engineering Pub Date : 2012-10-01 DOI: 10.2197/ipsjtsldm.5.118
Hao Xiao, T. Isshiki, Dongju Li, H. Kunieda, Yuko Nakase, Sadahiro Kimura
Inter-processor communication and synchronization are critical problems in embedded multiprocessors. In order to achieve high-speed communication and low-latency synchronization, most recent designs employ dedicated hardware engines to support these communication protocols individually, which is complex, inflexible, and error prone. Thus, this paper motivates the optimization of inter-processor communication and synchronization by using application-specific instruction-set processor (ASIP) techniques. The proposed communication mechanism is based on a set of custom instructions coupled with a low-latency on-chip network, which provides efficient support for both data transfer and process synchronization. By using state-of-the-art ASIP design methodology, we embed the communication functionalities into a base processor, making the proposed mechanism feature ultra low overhead. More importantly, industry-standard compatible programming interfaces supporting both message-passing and shared-memory paradigms are exposed to end-users to ease the software porting. Experimental results show that the bandwidth of the proposed message-passing protocol can achieve up to 703 Mbyte/s @ 200 MHz, and the latency of the proposed synchronization protocol can be reduced by more than 81% when compared with the conventional approach. Moreover, as a case study, we also show the effectiveness of the proposed communication mechanism in a real-life embedded application, WiMedia UWB MAC.
处理器间通信和同步是嵌入式多处理器中的关键问题。为了实现高速通信和低延迟同步,最新的设计采用专用硬件引擎来单独支持这些通信协议,这是复杂的,不灵活的,并且容易出错。因此,本文通过应用专用指令集处理器(ASIP)技术来优化处理器间的通信和同步。所提出的通信机制基于一组自定义指令和低延迟片上网络,为数据传输和进程同步提供了有效的支持。通过使用最先进的ASIP设计方法,我们将通信功能嵌入到基本处理器中,使所提出的机制具有超低开销的特点。更重要的是,向最终用户公开了支持消息传递和共享内存范例的行业标准兼容编程接口,以简化软件移植。实验结果表明,所提出的消息传递协议的带宽在200 MHz时可达到703 Mbyte/s,与传统同步协议相比,所提出的同步协议的延迟降低了81%以上。此外,作为一个案例研究,我们还展示了所提出的通信机制在实际嵌入式应用WiMedia UWB MAC中的有效性。
{"title":"Optimized Communication and Synchronization for Embedded Multiprocessors Using ASIP Methodology","authors":"Hao Xiao, T. Isshiki, Dongju Li, H. Kunieda, Yuko Nakase, Sadahiro Kimura","doi":"10.2197/ipsjtsldm.5.118","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.5.118","url":null,"abstract":"Inter-processor communication and synchronization are critical problems in embedded multiprocessors. In order to achieve high-speed communication and low-latency synchronization, most recent designs employ dedicated hardware engines to support these communication protocols individually, which is complex, inflexible, and error prone. Thus, this paper motivates the optimization of inter-processor communication and synchronization by using application-specific instruction-set processor (ASIP) techniques. The proposed communication mechanism is based on a set of custom instructions coupled with a low-latency on-chip network, which provides efficient support for both data transfer and process synchronization. By using state-of-the-art ASIP design methodology, we embed the communication functionalities into a base processor, making the proposed mechanism feature ultra low overhead. More importantly, industry-standard compatible programming interfaces supporting both message-passing and shared-memory paradigms are exposed to end-users to ease the software porting. Experimental results show that the bandwidth of the proposed message-passing protocol can achieve up to 703 Mbyte/s @ 200 MHz, and the latency of the proposed synchronization protocol can be reduced by more than 81% when compared with the conventional approach. Moreover, as a case study, we also show the effectiveness of the proposed communication mechanism in a real-life embedded application, WiMedia UWB MAC.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78389873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
An Exact Estimation Algorithm of Error Propagation Probability for Sequential Circuits 时序电路误差传播概率的精确估计算法
Q4 Engineering Pub Date : 2012-08-17 DOI: 10.2197/ipsjtsldm.5.63
Masayoshi Yoshimura, Y. Akamine, Y. Matsunaga
In advanced integrated circuit technology, the soft error tolerance is low. Soft errors ultimately lead to failure in VLSIs. We propose a method for the exact estimation of error propagation probabilities in sequential circuits whose FFs latch failure values. The failure due to soft errors in sequential circuits is defined using the modified product machine. The modified product machine monitors whether failure values appear at any primary output. The behavior of the modified product machine is analyzed with the Markov model. The probabilities that the failure values latched into the flip-flops (FFs) appear at any primary output are calculated from the state transition probabilities of the modified product machine. The time required for solving simultaneous linear equations accounts for a large portion of the execution time. We also propose two acceleration techniques to enable the application of our estimation method to larger scale circuits. These acceleration techniques reduce the number of variables in simultaneous linear equations. We apply the proposed method to ISCAS'89 and MCNC benchmark circuits and estimate error propagation probabilities for sequential circuits. Experimental results show that total execution times for the proposed method with two acceleration techniques are up to 10 times lesser than the total execution times for a naive implementation.
在先进的集成电路技术中,软误差容忍度很低。软错误最终会导致vlsi失效。我们提出了一种精确估计具有FFs锁存失效值的顺序电路中误差传播概率的方法。用改进后的产品机定义了顺序电路中由软误差引起的故障。改进后的产品机监测在任何一次输出是否出现故障值。用马尔可夫模型分析了改进后的产品机的行为。根据改进后的产品机的状态转移概率,计算出锁存到触发器(FFs)的故障值出现在任何一次输出的概率。求解联立线性方程所需的时间占执行时间的很大一部分。我们还提出了两种加速技术,使我们的估计方法应用于更大规模的电路。这些加速技术减少了联立线性方程中变量的数量。我们将该方法应用于ISCAS'89和MCNC基准电路,并估计了顺序电路的误差传播概率。实验结果表明,采用两种加速技术后,该方法的总执行时间比原始实现的总执行时间少10倍。
{"title":"An Exact Estimation Algorithm of Error Propagation Probability for Sequential Circuits","authors":"Masayoshi Yoshimura, Y. Akamine, Y. Matsunaga","doi":"10.2197/ipsjtsldm.5.63","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.5.63","url":null,"abstract":"In advanced integrated circuit technology, the soft error tolerance is low. Soft errors ultimately lead to failure in VLSIs. We propose a method for the exact estimation of error propagation probabilities in sequential circuits whose FFs latch failure values. The failure due to soft errors in sequential circuits is defined using the modified product machine. The modified product machine monitors whether failure values appear at any primary output. The behavior of the modified product machine is analyzed with the Markov model. The probabilities that the failure values latched into the flip-flops (FFs) appear at any primary output are calculated from the state transition probabilities of the modified product machine. The time required for solving simultaneous linear equations accounts for a large portion of the execution time. We also propose two acceleration techniques to enable the application of our estimation method to larger scale circuits. These acceleration techniques reduce the number of variables in simultaneous linear equations. We apply the proposed method to ISCAS'89 and MCNC benchmark circuits and estimate error propagation probabilities for sequential circuits. Experimental results show that total execution times for the proposed method with two acceleration techniques are up to 10 times lesser than the total execution times for a naive implementation.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77311471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Energy-efficient High-level Synthesis for HDR Architectures HDR架构的高能效高级综合
Q4 Engineering Pub Date : 2012-08-17 DOI: 10.2197/ipsjtsldm.5.106
Shin-ya Abe, M. Yanagisawa, N. Togawa
{"title":"Energy-efficient High-level Synthesis for HDR Architectures","authors":"Shin-ya Abe, M. Yanagisawa, N. Togawa","doi":"10.2197/ipsjtsldm.5.106","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.5.106","url":null,"abstract":"","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90376999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A Stackable LTE Chip for Cost-effective 3D Systems 用于高性价比3D系统的可堆叠LTE芯片
Q4 Engineering Pub Date : 2012-01-01 DOI: 10.2197/ipsjtsldm.5.2
W. Lafi, D. Lattard, A. Jerraya
To address the problem of prohibitive cost of advanced fabrication technologies, one solution consists in reusing masks to address a wide range of ICs. This could be achieved by a modular circuit that can be stacked to build TSV-based 3D systems with processing performance adapted to several applications. This paper focuses on 4G wireless telecom applications. We propose a basic circuit that meets the SISO (Single Input Single Output) transmission mode. By stacking multiple instances of this same circuit, it will be possible to address several MIMO (Multiple Input Multiple Output) modes. The proposed circuit is composed of several processing units interconnected by a 3D NoC and controlled by a host processor. Compared to a 2D reference platform, the proposed circuit keeps at least the same performance and power consumption in the context of 4G telecom applications, while reducing total cost. More generally, our cost analysis shows that 3D integration efficiency depends on the size of the circuit and the stacking option (die-to-die, die-to-wafer and interposer-based stacking).
为了解决先进制造技术成本过高的问题,一种解决方案是重复使用掩模来解决各种ic。这可以通过一个模块化电路来实现,该电路可以堆叠以构建基于tsv的3D系统,其处理性能适用于多种应用。本文主要研究4G无线通信的应用。我们提出了一种满足SISO(单输入单输出)传输模式的基本电路。通过堆叠同一电路的多个实例,可以解决多个MIMO(多输入多输出)模式。所提出的电路由若干处理单元组成,这些处理单元由3D NoC互连,并由主处理器控制。与2D参考平台相比,所提出的电路在4G电信应用中至少保持相同的性能和功耗,同时降低了总成本。更一般地说,我们的成本分析表明,3D集成效率取决于电路的尺寸和堆叠选项(模对模、模对晶圆和基于中间体的堆叠)。
{"title":"A Stackable LTE Chip for Cost-effective 3D Systems","authors":"W. Lafi, D. Lattard, A. Jerraya","doi":"10.2197/ipsjtsldm.5.2","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.5.2","url":null,"abstract":"To address the problem of prohibitive cost of advanced fabrication technologies, one solution consists in reusing masks to address a wide range of ICs. This could be achieved by a modular circuit that can be stacked to build TSV-based 3D systems with processing performance adapted to several applications. This paper focuses on 4G wireless telecom applications. We propose a basic circuit that meets the SISO (Single Input Single Output) transmission mode. By stacking multiple instances of this same circuit, it will be possible to address several MIMO (Multiple Input Multiple Output) modes. The proposed circuit is composed of several processing units interconnected by a 3D NoC and controlled by a host processor. Compared to a 2D reference platform, the proposed circuit keeps at least the same performance and power consumption in the context of 4G telecom applications, while reducing total cost. More generally, our cost analysis shows that 3D integration efficiency depends on the size of the circuit and the stacking option (die-to-die, die-to-wafer and interposer-based stacking).","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79650775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
System-On-Chip for Biologically Inspired Vision Applications 生物启发视觉应用的片上系统
Q4 Engineering Pub Date : 2012-01-01 DOI: 10.2197/ipsjtsldm.5.71
Sungho Park, Ahmed Al-Maashri, K. Irick, A. Chandrashekhar, M. Cotter, Nandhini Chandramoorthy, M. DeBole, N. Vijaykrishnan
Neuromorphic vision algorithms are biologically-inspired computational models of the primate visual pathway. They promise robustness, high accuracy, and high energy efficiency in advanced image processing applications. Despite these potential benefits, the realization of neuromorphic algorithms typically exhibit low performance even when executed on multi-core CPU and GPU platforms. This is due to the disparity in the computational modalities prominent in these algorithms and those modalities most exploited in contemporary computer architectures. In essence, acceleration of neuromorphic algorithms requires adherence to specific computational and communicational requirements. This paper discusses these requirements and proposes a framework for mapping neuromorphic vision applications on a System-on-Chip, SoC. A neuromorphic object detection and recognition on a multi-FPGA platform is presented with performance and power efficiency comparisons to CMP and GPU implementations.
神经形态视觉算法是受生物学启发的灵长类视觉通路计算模型。它们承诺在高级图像处理应用中具有鲁棒性、高精度和高能效。尽管有这些潜在的好处,神经形态算法的实现通常表现出较低的性能,即使在多核CPU和GPU平台上执行。这是由于在这些算法中突出的计算模式和那些在当代计算机体系结构中最受利用的模式的差异。本质上,神经形态算法的加速需要遵守特定的计算和通信要求。本文讨论了这些需求,并提出了一个在片上系统(SoC)上映射神经形态视觉应用的框架。提出了一种基于多fpga平台的神经形态目标检测和识别方法,并与CMP和GPU实现进行了性能和功耗比较。
{"title":"System-On-Chip for Biologically Inspired Vision Applications","authors":"Sungho Park, Ahmed Al-Maashri, K. Irick, A. Chandrashekhar, M. Cotter, Nandhini Chandramoorthy, M. DeBole, N. Vijaykrishnan","doi":"10.2197/ipsjtsldm.5.71","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.5.71","url":null,"abstract":"Neuromorphic vision algorithms are biologically-inspired computational models of the primate visual pathway. They promise robustness, high accuracy, and high energy efficiency in advanced image processing applications. Despite these potential benefits, the realization of neuromorphic algorithms typically exhibit low performance even when executed on multi-core CPU and GPU platforms. This is due to the disparity in the computational modalities prominent in these algorithms and those modalities most exploited in contemporary computer architectures. In essence, acceleration of neuromorphic algorithms requires adherence to specific computational and communicational requirements. This paper discusses these requirements and proposes a framework for mapping neuromorphic vision applications on a System-on-Chip, SoC. A neuromorphic object detection and recognition on a multi-FPGA platform is presented with performance and power efficiency comparisons to CMP and GPU implementations.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73513793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
期刊
IPSJ Transactions on System LSI Design Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1