首页 > 最新文献

IPSJ Transactions on System LSI Design Methodology最新文献

英文 中文
Courier: A Toolchain for Application Acceleration on Heterogeneous Platforms Courier:异构平台上应用程序加速的工具链
Q4 Engineering Pub Date : 2015-02-01 DOI: 10.2197/ipsjtsldm.8.105
Takaaki Miyajima, David B. Thomas, H. Amano
Computationally intensive applications using an open-source library such as OpenCV, BLAS or FFT are widely available on various research or industry applications. Although the optimized code of such libraries has been prepared for an accelerator, off-loading is difficult for non-expert users, especially when only binary of applications can be accessed. This paper presents a new toolchain for application acceleration called Courier. It only requires a executable binary of the target application and a corresponding function code for an accelerator. Besides, it doesn’t require a source code of the application nor re-compilation of the binary. A work-flow of Courier is a simple and intended for non-expert users. It extracts runtime information from running binary, generates task graph, and then replaces the original function with a corresponding accelerator function. Many steps along with the application acceleration process are automatically executed. The users can refer to the acceleration result and modify the task graph if needed. In our case studies, Courier was used for acceleration of three applications; image processing, matrix multiplication and spectrum analysis. Functions are off-loaded to a GPU without any modification to the original source code. Applications are sped up 8.89, 8.16 and 1.23 times, respectively.
使用开源库(如OpenCV、BLAS或FFT)的计算密集型应用程序广泛用于各种研究或工业应用程序。尽管这些库的优化代码已经为加速器准备好了,但是卸载对于非专业用户来说是困难的,特别是当只能访问应用程序的二进制文件时。本文提出了一种新的应用程序加速工具链——Courier。它只需要目标应用程序的可执行二进制文件和加速器的相应功能代码。此外,它不需要应用程序的源代码,也不需要重新编译二进制文件。Courier的工作流程非常简单,适合非专业用户使用。它从运行的二进制文件中提取运行时信息,生成任务图,然后用相应的加速函数替换原来的函数。应用程序加速过程中的许多步骤将自动执行。用户可以参考加速结果,并根据需要修改任务图。在我们的案例研究中,Courier被用来加速三个应用程序;图像处理,矩阵乘法和频谱分析。功能卸载到GPU没有任何修改的原始源代码。应用程序的速度分别提高了8.89倍、8.16倍和1.23倍。
{"title":"Courier: A Toolchain for Application Acceleration on Heterogeneous Platforms","authors":"Takaaki Miyajima, David B. Thomas, H. Amano","doi":"10.2197/ipsjtsldm.8.105","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.8.105","url":null,"abstract":"Computationally intensive applications using an open-source library such as OpenCV, BLAS or FFT are widely available on various research or industry applications. Although the optimized code of such libraries has been prepared for an accelerator, off-loading is difficult for non-expert users, especially when only binary of applications can be accessed. This paper presents a new toolchain for application acceleration called Courier. It only requires a executable binary of the target application and a corresponding function code for an accelerator. Besides, it doesn’t require a source code of the application nor re-compilation of the binary. A work-flow of Courier is a simple and intended for non-expert users. It extracts runtime information from running binary, generates task graph, and then replaces the original function with a corresponding accelerator function. Many steps along with the application acceleration process are automatically executed. The users can refer to the acceleration result and modify the task graph if needed. In our case studies, Courier was used for acceleration of three applications; image processing, matrix multiplication and spectrum analysis. Functions are off-loaded to a GPU without any modification to the original source code. Applications are sped up 8.89, 8.16 and 1.23 times, respectively.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84591194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Layer Assignment and Equal-length Routing for Disordered Pins in PCB Design PCB设计中无序引脚的层分配与等长布线
Q4 Engineering Pub Date : 2015-02-01 DOI: 10.2197/ipsjtsldm.8.75
Ran Zhang, Tieyuan Pan, Li Zhu, Takahiro Watanabe
In recent printed circuit board (PCB) design, due to the high density of integration, the signal propagation delay or skew has become an important factor for a circuit performance. As the routing delay is proportional to the wire length, the controllability of the wire length is usually focused on. In this research, a heuristic algorithm to get equal-length routing for disordered pins in PCB design is proposed. The approach initially checks the longest common subsequence of source and target pin sets to assign layers for pins. Single commodity flow is then carried out to generate the base routes. Finally, considering target length requirement and available routing region, R-flip and C-flip are adopted to adjust the wire length. The experimental results show that the proposed method is able to obtain the routes with better wire length balance and smaller worst length error in reasonable CPU times.
在近年来的印刷电路板(PCB)设计中,由于集成度高,信号的传播延迟或倾斜已成为影响电路性能的一个重要因素。由于路由延迟与线长成正比,因此通常关注线长可控性。针对PCB设计中无序引脚的等长布线问题,提出了一种启发式算法。该方法首先检查源引脚集和目标引脚集的最长公共子序列,为引脚分配层。然后进行单一商品流生成基本路线。最后,考虑目标长度要求和可用路由区域,采用r翻转和c翻转来调整导线长度。实验结果表明,该方法能够在合理的CPU时间内获得线长平衡较好、最坏长度误差较小的路由。
{"title":"Layer Assignment and Equal-length Routing for Disordered Pins in PCB Design","authors":"Ran Zhang, Tieyuan Pan, Li Zhu, Takahiro Watanabe","doi":"10.2197/ipsjtsldm.8.75","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.8.75","url":null,"abstract":"In recent printed circuit board (PCB) design, due to the high density of integration, the signal propagation delay or skew has become an important factor for a circuit performance. As the routing delay is proportional to the wire length, the controllability of the wire length is usually focused on. In this research, a heuristic algorithm to get equal-length routing for disordered pins in PCB design is proposed. The approach initially checks the longest common subsequence of source and target pin sets to assign layers for pins. Single commodity flow is then carried out to generate the base routes. Finally, considering target length requirement and available routing region, R-flip and C-flip are adopted to adjust the wire length. The experimental results show that the proposed method is able to obtain the routes with better wire length balance and smaller worst length error in reasonable CPU times.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91033892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient Design Exploration Framework of SW/HW Systems Based on Tightly-coupled Thread Model 基于紧密耦合线程模型的软件/硬件系统高效设计探索框架
Q4 Engineering Pub Date : 2015-01-01 DOI: 10.2197/ipsjtsldm.8.38
A. Khan, T. Isshiki, Dongju Li, H. Kunieda
In order to meet the increased computational requirement of today’s consumer portable devices, heterogeneous multiprocessor system-on-chip (MPSoC) architectures have become widespread. These MPSoCs include not only multiple processors but also multiple dedicated hardware accelerators. Due to the increase complexity of the MPSoC, fast and accurate design space exploration (DSE) for best system performance at early stage of the design process is desired. Any DSE solution is desired to provide best system partitioning scheme for best performance with efficient area utilization. In this paper we propose a design space exploration framework for heterogeneous MPSoC based on tightly-coupled thread (TCT) parallel programing model which can handles system partition exploration and HW synthesis exploration. The proposed framework drastically reduces the exponential size design space into near-linear size by utilizing the accurate HW timing models as the indicator for system bottleneck and guiding the enumeration process of HW version combinations. Experimental results shows the accuracy of the proposed method with an average estimation error of 1.38% for HW timing of each thread, and 2.80% estimation error for the system-level simulation, where the simulation speedup factor was in the order of 5,000 times. Currently the proposed framework partially depends on a high level synthesis (HLS) tool eXCite, but other HLS tools can be easily integrated into the proposed framework.
为了满足当今消费便携式设备日益增长的计算需求,异构多处理器片上系统(MPSoC)架构已经得到广泛应用。这些mpsoc不仅包括多个处理器,还包括多个专用硬件加速器。由于MPSoC的复杂性增加,需要在设计过程的早期阶段快速准确地进行设计空间探索(DSE)以获得最佳系统性能。任何DSE解决方案都希望提供最佳的系统分区方案,以获得最佳性能和有效的区域利用率。本文提出了一种基于紧耦合线程并行编程模型的异构MPSoC设计空间探索框架,该框架可以处理系统分区探索和硬件综合探索。该框架利用精确的硬件时序模型作为系统瓶颈指标,指导硬件版本组合的枚举过程,将指数大小的设计空间大幅缩小为近线性大小。实验结果表明,该方法对每个线程的HW时序的平均估计误差为1.38%,对系统级仿真的估计误差为2.80%,其中仿真加速因子约为5000倍。目前提出的框架部分依赖于高级综合(HLS)工具eXCite,但其他HLS工具可以很容易地集成到提出的框架中。
{"title":"Efficient Design Exploration Framework of SW/HW Systems Based on Tightly-coupled Thread Model","authors":"A. Khan, T. Isshiki, Dongju Li, H. Kunieda","doi":"10.2197/ipsjtsldm.8.38","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.8.38","url":null,"abstract":"In order to meet the increased computational requirement of today’s consumer portable devices, heterogeneous multiprocessor system-on-chip (MPSoC) architectures have become widespread. These MPSoCs include not only multiple processors but also multiple dedicated hardware accelerators. Due to the increase complexity of the MPSoC, fast and accurate design space exploration (DSE) for best system performance at early stage of the design process is desired. Any DSE solution is desired to provide best system partitioning scheme for best performance with efficient area utilization. In this paper we propose a design space exploration framework for heterogeneous MPSoC based on tightly-coupled thread (TCT) parallel programing model which can handles system partition exploration and HW synthesis exploration. The proposed framework drastically reduces the exponential size design space into near-linear size by utilizing the accurate HW timing models as the indicator for system bottleneck and guiding the enumeration process of HW version combinations. Experimental results shows the accuracy of the proposed method with an average estimation error of 1.38% for HW timing of each thread, and 2.80% estimation error for the system-level simulation, where the simulation speedup factor was in the order of 5,000 times. Currently the proposed framework partially depends on a high level synthesis (HLS) tool eXCite, but other HLS tools can be easily integrated into the proposed framework.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75092060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems 嵌入式系统中部分可靠刮刮板存储器的分配优化方法
Q4 Engineering Pub Date : 2015-01-01 DOI: 10.2197/ipsjtsldm.8.100
Takuya Hatayama, Hideki Takase, K. Takagi, N. Takagi
In this paper, we propose the use of a memory system which has a partially reliable scratch-pad memory (SPM). The reliable region of the SPM employing the ECC is higher soft error tolerant but larger energy consumption than the normal region. We propose an allocation method in order to optimize energy consumption while ensuring required reliability. An allocation method about instruction and data to proposed memory system is formulated as integer linear programming, where the solution archives optimal energy consumption and required reliability. Evaluation result shows that the proposed method is effective when overhead for error correction is large.
在本文中,我们建议使用一种具有部分可靠的刮擦板存储器(SPM)的存储系统。采用ECC的SPM可靠区域比正常区域具有更高的软容错性,但能耗较大。在保证系统可靠性的前提下,提出了一种优化系统能耗的分配方法。将指令和数据分配到所提出的存储系统的方法表述为整数线性规划,其中解决方案包含最优的能耗和所需的可靠性。评价结果表明,在误差校正开销较大的情况下,该方法是有效的。
{"title":"An Allocation Optimization Method for Partially-reliable Scratch-pad Memory in Embedded Systems","authors":"Takuya Hatayama, Hideki Takase, K. Takagi, N. Takagi","doi":"10.2197/ipsjtsldm.8.100","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.8.100","url":null,"abstract":"In this paper, we propose the use of a memory system which has a partially reliable scratch-pad memory (SPM). The reliable region of the SPM employing the ECC is higher soft error tolerant but larger energy consumption than the normal region. We propose an allocation method in order to optimize energy consumption while ensuring required reliability. An allocation method about instruction and data to proposed memory system is formulated as integer linear programming, where the solution archives optimal energy consumption and required reliability. Evaluation result shows that the proposed method is effective when overhead for error correction is large.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76607891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DRAMSys: A Flexible DRAM Subsystem Design Space Exploration Framework 一个灵活的DRAM子系统设计空间探索框架
Q4 Engineering Pub Date : 2015-01-01 DOI: 10.2197/ipsjtsldm.8.63
Matthias Jung, C. Weis, N. Wehn
In systems ranging from mobile devices to servers, Dynamic Random Access Memories (DRAM) have a big impact on performance and contributes a significant part of the total consumed power. Conventional DDR3-based solutions are stretched thin as their maximum bandwidth is limited by the I/O count and interface speed. As new solutions are coming onto the market (JEDEC DDR4, JEDEC WIDE I/O, Micron’s hybrid memory cube: HMC or JEDEC’s high bandwidth memory: HBM) it is critical to evaluate the performance of these solutions and assess their suitability for specific applications. Furthermore, in systems with 3D stacking, the challenges of high power densities and thermal dissipation are exacerbated. It is crucial to have a flexible and holistic DRAM subsystem framework for exhaustive design space explorations, which can handle all this different types of memories, as well as the aspects of performance, power and temperature.
在从移动设备到服务器的各种系统中,动态随机存取存储器(DRAM)对性能有很大的影响,并且贡献了总消耗功率的很大一部分。传统的基于ddr3的解决方案被拉长了,因为它们的最大带宽受到I/O计数和接口速度的限制。随着新的解决方案(JEDEC DDR4, JEDEC WIDE I/O,美光的混合存储立方体:HMC或JEDEC的高带宽内存:HBM)进入市场,评估这些解决方案的性能并评估它们对特定应用的适用性至关重要。此外,在具有3D堆叠的系统中,高功率密度和散热的挑战加剧了。关键是要有一个灵活和全面的DRAM子系统框架,以详尽的设计空间探索,它可以处理所有这些不同类型的存储器,以及性能,功耗和温度方面。
{"title":"DRAMSys: A Flexible DRAM Subsystem Design Space Exploration Framework","authors":"Matthias Jung, C. Weis, N. Wehn","doi":"10.2197/ipsjtsldm.8.63","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.8.63","url":null,"abstract":"In systems ranging from mobile devices to servers, Dynamic Random Access Memories (DRAM) have a big impact on performance and contributes a significant part of the total consumed power. Conventional DDR3-based solutions are stretched thin as their maximum bandwidth is limited by the I/O count and interface speed. As new solutions are coming onto the market (JEDEC DDR4, JEDEC WIDE I/O, Micron’s hybrid memory cube: HMC or JEDEC’s high bandwidth memory: HBM) it is critical to evaluate the performance of these solutions and assess their suitability for specific applications. Furthermore, in systems with 3D stacking, the challenges of high power densities and thermal dissipation are exacerbated. It is crucial to have a flexible and holistic DRAM subsystem framework for exhaustive design space explorations, which can handle all this different types of memories, as well as the aspects of performance, power and temperature.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77848923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Layout Generator with Flexible Grid Assignment for Area Efficient Standard Cell 具有灵活网格分配的区域高效标准单元布局生成器
Q4 Engineering Pub Date : 2015-01-01 DOI: 10.2197/ipsjtsldm.8.131
S. Nishizawa, T. Ishihara, H. Onodera
This paper discusses a standard cell layout generator that can be used to generate a standard cell library optimized to a target application. It can generate an area efficient layout from a virtual-grid symbolic layout with the ability of flexible grid positioning that considers local design rules enforced in a scaled technology. The generator reduces the cost of library design and enables an optimization of each cell with detailed layout information that can be used to estimate the performance of the cell under design. A standard cell library has been generated for commercial 28-nm FDSOI CMOS process using the proposed layout generator, and used for circuit design. Correct operation of designed circuit is observed form fabricated chip test.
本文讨论了一个标准单元格布局生成器,它可用于生成针对目标应用程序进行优化的标准单元格库。它可以从虚拟网格符号布局生成面积有效的布局,并具有灵活的网格定位能力,考虑了缩放技术中实施的局部设计规则。该生成器降低了库设计的成本,并可以使用详细的布局信息对每个单元进行优化,这些信息可用于估计设计单元的性能。利用所提出的布局生成器,已为商用28纳米FDSOI CMOS工艺生成标准单元库,并用于电路设计。通过制片试验,观察设计电路的正确运行。
{"title":"Layout Generator with Flexible Grid Assignment for Area Efficient Standard Cell","authors":"S. Nishizawa, T. Ishihara, H. Onodera","doi":"10.2197/ipsjtsldm.8.131","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.8.131","url":null,"abstract":"This paper discusses a standard cell layout generator that can be used to generate a standard cell library optimized to a target application. It can generate an area efficient layout from a virtual-grid symbolic layout with the ability of flexible grid positioning that considers local design rules enforced in a scaled technology. The generator reduces the cost of library design and enables an optimization of each cell with detailed layout information that can be used to estimate the performance of the cell under design. A standard cell library has been generated for commercial 28-nm FDSOI CMOS process using the proposed layout generator, and used for circuit design. Correct operation of designed circuit is observed form fabricated chip test.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88488462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automatic Synthesis of Inter-heterogeneous-processor Communication for Programmable System-on-chip 可编程片上系统异构处理器间通信的自动合成
Q4 Engineering Pub Date : 2015-01-01 DOI: 10.2197/ipsjtsldm.8.95
Yuki Ando, Y. Ishida, S. Honda, H. Takada, M. Edahiro
This paper introduces an automatic synthesis technique and tool to implement inter-heterogeneousprocessor communication for programmable system-on-chips (PSoCs). PSoCs have an ARM-based hard processor system connected to an FPGA fabric. By implementing the soft processors in the FPGA fabric, PSoCs realize heterogeneous multiprocessors. Since the number and type of soft processors are configurable, PSoCs can be various heterogeneous multiprocessors. However, the inter-heterogeneous-processor communications are not supported by single binary operating systems. Proposed method automatically synthesizes the inter-heterogeneous-processor communications at an application layer from a general model description. The case study shows that automatically generated inter-heterogeneous-processor communication exactly runs the system on heterogeneous multiprocessors.
本文介绍了一种实现可编程片上系统(psoc)异构处理器间通信的自动合成技术和工具。psoc具有连接到FPGA结构的基于arm的硬处理器系统。通过在FPGA结构中实现软处理器,psoc实现了异构多处理器。由于软处理器的数量和类型是可配置的,所以psoc可以是各种异构多处理器。然而,单二进制操作系统不支持异构处理器间的通信。提出了一种基于通用模型描述的应用层异构处理器间通信自动合成方法。实例研究表明,自动生成的异构处理器间通信能够准确地在异构多处理器上运行系统。
{"title":"Automatic Synthesis of Inter-heterogeneous-processor Communication for Programmable System-on-chip","authors":"Yuki Ando, Y. Ishida, S. Honda, H. Takada, M. Edahiro","doi":"10.2197/ipsjtsldm.8.95","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.8.95","url":null,"abstract":"This paper introduces an automatic synthesis technique and tool to implement inter-heterogeneousprocessor communication for programmable system-on-chips (PSoCs). PSoCs have an ARM-based hard processor system connected to an FPGA fabric. By implementing the soft processors in the FPGA fabric, PSoCs realize heterogeneous multiprocessors. Since the number and type of soft processors are configurable, PSoCs can be various heterogeneous multiprocessors. However, the inter-heterogeneous-processor communications are not supported by single binary operating systems. Proposed method automatically synthesizes the inter-heterogeneous-processor communications at an application layer from a general model description. The case study shows that automatically generated inter-heterogeneous-processor communication exactly runs the system on heterogeneous multiprocessors.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79170224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Efficient Performance Estimation Method for Configurable Multi-layer Bus-based SoC 一种高效的可配置多层总线SoC性能评估方法
Q4 Engineering Pub Date : 2015-01-01 DOI: 10.2197/ipsjtsldm.8.26
Salita Sombatsiri, Y. Takeuchi, M. Imai
This paper proposes an efficient performance estimation method for configurable multi-layer bus-based SoC, which evaluates system performance in an early stage of design process. The proposed method uses data flow information obtained from a system-level profiling, an architecture-independent loosely-timed transaction level simulation, and constructs a system-level execution dependency graph. Then, based on each architecture-level model, the architecture-level execution dependency graph is constructed and analyzed to estimate the performance of each architecture. In the analysis, the behavior details of shared buses and multi-layer bus are determined based on the analyzed dynamic bus contention and bus protocols’ features. Experiments were conducted by modeling the multi-layer AHB and applying the method to estimate performance of the architectures executing JPEG encoder application. The proposed method estimates the performance of SoC with less than 8% of errors comparing to the results from accurate RTL simulations.
提出了一种高效的可配置多层总线SoC性能评估方法,可在设计初期对系统性能进行评估。该方法利用从系统级分析中获得的数据流信息、与体系结构无关的松散时间事务级模拟,构建系统级执行依赖图。然后,在每个体系结构级模型的基础上,构造并分析体系结构级执行依赖图,以估计每个体系结构的性能。在分析中,根据所分析的动态总线争用和总线协议的特点,确定了共享总线和多层总线的行为细节。通过对多层AHB进行建模,并应用该方法对执行JPEG编码器应用的体系结构进行性能评估。与精确的RTL仿真结果相比,该方法估计SoC的性能误差小于8%。
{"title":"An Efficient Performance Estimation Method for Configurable Multi-layer Bus-based SoC","authors":"Salita Sombatsiri, Y. Takeuchi, M. Imai","doi":"10.2197/ipsjtsldm.8.26","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.8.26","url":null,"abstract":"This paper proposes an efficient performance estimation method for configurable multi-layer bus-based SoC, which evaluates system performance in an early stage of design process. The proposed method uses data flow information obtained from a system-level profiling, an architecture-independent loosely-timed transaction level simulation, and constructs a system-level execution dependency graph. Then, based on each architecture-level model, the architecture-level execution dependency graph is constructed and analyzed to estimate the performance of each architecture. In the analysis, the behavior details of shared buses and multi-layer bus are determined based on the analyzed dynamic bus contention and bus protocols’ features. Experiments were conducted by modeling the multi-layer AHB and applying the method to estimate performance of the architectures executing JPEG encoder application. The proposed method estimates the performance of SoC with less than 8% of errors comparing to the results from accurate RTL simulations.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89705294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Heterogeneous Multi-core Architectures 异构多核架构
Q4 Engineering Pub Date : 2015-01-01 DOI: 10.2197/ipsjtsldm.8.51
T. Mitra
Transistor count continues to increase for silicon devices following Moore’s Law. But the failure of Dennard scaling has brought the computing community to a crossroad where power has become the major limiting factor. Thus future chips can have many cores; but only a fraction of them can be switched on at any point in time. This dark silicon era, where significant fraction of the chip real estate remains dark, has necessitated a fundamental rethinking in architectural designs. In this context, heterogeneous multi-core architectures combining functionality and performance-wise divergent mix of processing cores (CPU, GPU, special-purpose accelerators, and reconfigurable computing) offer a promising option. Heterogeneous multi-cores can potentially provide energy-efficient computation as only the cores most suitable for the current computation need to be switched on. This article presents an overview of the state-of-the-art in heterogeneous multi-core landscape.
按照摩尔定律,硅器件的晶体管数量继续增加。但是Dennard缩放的失败将计算社区带到了一个十字路口,在这个十字路口,功率已经成为主要的限制因素。因此,未来的芯片可以有多个核心;但只有一小部分能在任何时间点被打开。在这个黑暗的硅时代,芯片领域的很大一部分仍然是黑暗的,这需要对架构设计进行根本性的反思。在这种情况下,异构多核架构结合了功能和性能方面的处理核心(CPU、GPU、专用加速器和可重构计算)的不同组合,提供了一个很有前途的选择。异构多核可以提供潜在的节能计算,因为只需要打开最适合当前计算的核心。本文概述了异构多核领域的最新技术。
{"title":"Heterogeneous Multi-core Architectures","authors":"T. Mitra","doi":"10.2197/ipsjtsldm.8.51","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.8.51","url":null,"abstract":"Transistor count continues to increase for silicon devices following Moore’s Law. But the failure of Dennard scaling has brought the computing community to a crossroad where power has become the major limiting factor. Thus future chips can have many cores; but only a fraction of them can be switched on at any point in time. This dark silicon era, where significant fraction of the chip real estate remains dark, has necessitated a fundamental rethinking in architectural designs. In this context, heterogeneous multi-core architectures combining functionality and performance-wise divergent mix of processing cores (CPU, GPU, special-purpose accelerators, and reconfigurable computing) offer a promising option. Heterogeneous multi-cores can potentially provide energy-efficient computation as only the cores most suitable for the current computation need to be switched on. This article presents an overview of the state-of-the-art in heterogeneous multi-core landscape.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77175031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Memory and Storage System Design with Nonvolatile Memory Technologies 采用非易失性存储器技术的存储器和存储系统设计
Q4 Engineering Pub Date : 2015-01-01 DOI: 10.2197/ipsjtsldm.8.2
Jishen Zhao, Cong Xu, Ping Chi, Yuan Xie
The memory and storage system, including processor caches, main memory, and storage, is an important component of various computer systems. The memory hierarchy is becoming a fundamental performance and energy bottleneck, due to the widening gap between the increasing bandwidth and energy demands of modern applications and the limited performance and energy efficiency provided by traditional memory technologies. As a result, computer architects are facing significant challenges in developing high-performance, energy-efficient, and reliable memory hierarchies. New byte-addressable nonvolatile memories (NVMs) are emerging with unique properties that are likely to open doors to novel memory hierarchy designs to tackle the challenges. However, substantial advancements in redesigning the existing memory and storage organizations are needed to realize their full potential. This article reviews recent innovations in rearchitecting the memory and storage system with NVMs, producing high-performance, energy-efficient, and scalable computer designs.
存储器和存储系统,包括处理器缓存、主存储器和存储器,是各种计算机系统的重要组成部分。由于现代应用日益增长的带宽和能源需求与传统存储技术提供的有限性能和能源效率之间的差距越来越大,存储器层次结构正在成为基本的性能和能源瓶颈。因此,计算机架构师在开发高性能、高能效和可靠的内存层次结构方面面临着重大挑战。新的字节可寻址非易失性存储器(nvm)正以其独特的特性出现,这可能为新的存储器层次结构设计打开大门,以应对这些挑战。然而,需要在重新设计现有内存和存储组织方面取得实质性进展,以充分发挥其潜力。本文回顾了使用nvm重新构建内存和存储系统的最新创新,以产生高性能、节能和可扩展的计算机设计。
{"title":"Memory and Storage System Design with Nonvolatile Memory Technologies","authors":"Jishen Zhao, Cong Xu, Ping Chi, Yuan Xie","doi":"10.2197/ipsjtsldm.8.2","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.8.2","url":null,"abstract":"The memory and storage system, including processor caches, main memory, and storage, is an important component of various computer systems. The memory hierarchy is becoming a fundamental performance and energy bottleneck, due to the widening gap between the increasing bandwidth and energy demands of modern applications and the limited performance and energy efficiency provided by traditional memory technologies. As a result, computer architects are facing significant challenges in developing high-performance, energy-efficient, and reliable memory hierarchies. New byte-addressable nonvolatile memories (NVMs) are emerging with unique properties that are likely to open doors to novel memory hierarchy designs to tackle the challenges. However, substantial advancements in redesigning the existing memory and storage organizations are needed to realize their full potential. This article reviews recent innovations in rearchitecting the memory and storage system with NVMs, producing high-performance, energy-efficient, and scalable computer designs.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83900197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
期刊
IPSJ Transactions on System LSI Design Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1