首页 > 最新文献

IPSJ Transactions on System LSI Design Methodology最新文献

英文 中文
Shift Register Initialization in Scalar Replacement for Reducing Code Size 减少代码大小的标量替换中的移位寄存器初始化
Q4 Engineering Pub Date : 2020-01-01 DOI: 10.2197/ipsjtsldm.13.2
Kenshu Seto
: Scalar replacement is an e ff ective technique to improve the performance of the RTL code generated by high-level synthesis (HLS) from C programs with intensive array accesses. In scalar replacement, data accessed from arrays are stored into shift registers, and later array accesses on the same data are replaced with the accesses to the shift registers instead of the arrays. Namely, scalar replacement replaces array accesses with shift register accesses. Since arrays in C programs are usually mapped to RAMs with limited numbers of ports, reducing array accesses with scalar replacement leads to the memory access reduction, which in turn improves the performance of the resulting RTL code. In real-life C programs, sometimes, shift registers must be initialized conditionally using multiple array accesses, which increases the number of array accesses in main loops. To reduce the conditional array access in the main loops, the previous scalar replacement method proposed the use of a loop transformation called loop peeling. Loop peeling brings significant increase in code size, leading to the negative impacts on performance or circuit area of the synthesized hardware. In this paper, we propose a new method to initialize shift registers without loop peeling. The proposed method works as a preprocessing of the input C program prior to scalar replacement. With experimental results, we demonstrate the proposed method reduces the numbers of execution cycles of the synthesized hardware compared to the previous method.
标量替换是一种有效的技术,可以提高由具有密集数组访问的C程序的高级综合(high-level synthesis, HLS)生成的RTL代码的性能。在标量替换中,从数组访问的数据被存储到移位寄存器中,随后对相同数据的数组访问被替换为对移位寄存器的访问,而不是对数组的访问。也就是说,标量替换用移位寄存器访问替换数组访问。由于C程序中的数组通常映射到端口数量有限的ram,因此使用标量替换减少数组访问会导致内存访问减少,从而提高生成的RTL代码的性能。在实际的C程序中,有时移位寄存器必须使用多个数组访问来有条件地初始化,这增加了主循环中数组访问的次数。为了减少主循环中的条件数组访问,先前的标量替换方法提出了一种称为循环剥离的循环变换。环路剥离会导致代码大小的显著增加,从而对合成硬件的性能或电路面积产生负面影响。在本文中,我们提出了一种新的方法来初始化移位寄存器而不产生环路剥离。所提出的方法是在标量替换之前对输入C程序进行预处理。实验结果表明,与之前的方法相比,所提出的方法减少了合成硬件的执行周期。
{"title":"Shift Register Initialization in Scalar Replacement for Reducing Code Size","authors":"Kenshu Seto","doi":"10.2197/ipsjtsldm.13.2","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.13.2","url":null,"abstract":": Scalar replacement is an e ff ective technique to improve the performance of the RTL code generated by high-level synthesis (HLS) from C programs with intensive array accesses. In scalar replacement, data accessed from arrays are stored into shift registers, and later array accesses on the same data are replaced with the accesses to the shift registers instead of the arrays. Namely, scalar replacement replaces array accesses with shift register accesses. Since arrays in C programs are usually mapped to RAMs with limited numbers of ports, reducing array accesses with scalar replacement leads to the memory access reduction, which in turn improves the performance of the resulting RTL code. In real-life C programs, sometimes, shift registers must be initialized conditionally using multiple array accesses, which increases the number of array accesses in main loops. To reduce the conditional array access in the main loops, the previous scalar replacement method proposed the use of a loop transformation called loop peeling. Loop peeling brings significant increase in code size, leading to the negative impacts on performance or circuit area of the synthesized hardware. In this paper, we propose a new method to initialize shift registers without loop peeling. The proposed method works as a preprocessing of the input C program prior to scalar replacement. With experimental results, we demonstrate the proposed method reduces the numbers of execution cycles of the synthesized hardware compared to the previous method.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79279120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Logic Optimization Method by Eliminating Redundant Multiple Faults from Higher to Lower Cardinality 一种从高到低基数消除冗余多故障的逻辑优化方法
Q4 Engineering Pub Date : 2020-01-01 DOI: 10.2197/ipsjtsldm.13.35
P. Wang, A. M. Gharehbaghi, M. Fujita
: In this paper, we propose a logic optimization method to remove the redundancy in the circuit. The incre- mental Automatic Test Pattern Generation method is used to find the redundant multiple faults. In order to remove as many redundancies as possible, instead of removing the redundant single faults first, we clear up the redundant faults from higher cardinality to lower cardinality. The experiments prove that the proposed method can successfully eliminate more redundancies comparing to the redundancy removal command in the synthesis tool SIS.
在本文中,我们提出了一种逻辑优化方法来消除电路中的冗余。采用增量式自动测试模式生成方法,发现冗余的多故障。为了尽可能多地去除冗余,我们不是先去除冗余的单个故障,而是将冗余故障从高基数清除到低基数。实验证明,与综合工具SIS中的冗余删除命令相比,该方法可以成功地消除更多的冗余。
{"title":"A Logic Optimization Method by Eliminating Redundant Multiple Faults from Higher to Lower Cardinality","authors":"P. Wang, A. M. Gharehbaghi, M. Fujita","doi":"10.2197/ipsjtsldm.13.35","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.13.35","url":null,"abstract":": In this paper, we propose a logic optimization method to remove the redundancy in the circuit. The incre- mental Automatic Test Pattern Generation method is used to find the redundant multiple faults. In order to remove as many redundancies as possible, instead of removing the redundant single faults first, we clear up the redundant faults from higher cardinality to lower cardinality. The experiments prove that the proposed method can successfully eliminate more redundancies comparing to the redundancy removal command in the synthesis tool SIS.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81283440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real Circuit Delay Measurement Method by Variable Frequency Operation with On-Chip Fine Resolution Oscillator 片上小分辨率振荡器变频操作实电路时延测量方法
Q4 Engineering Pub Date : 2020-01-01 DOI: 10.2197/ipsjtsldm.13.21
K. Shimamura, Naohiro Ikeda
With the progress of semiconductor process miniaturization, delay degradation by aging increases and threatens the reliability of fabricated chips. The amount of delay degradation is known to be circuit and workload dependent, but previous evaluations are based on simulations, and delay degradation measurement of real circuit under realistic workload has not been reported yet. This paper proposes real circuit delay measurement method, which achieves enough accuracy to measure circuit and workload dependent delay degradation. In the proposed method, onchip oscillator supplies fine resolution variable frequency clock to internal circuit. Internal circuit execute test pattern to activate critical paths at various frequency and determine the maximum frequency at which correct results can be obtained. The maximum frequency corresponds to the delay of the critical paths activated by the test pattern. Clock multiplication improves delay resolution, and repetitive measurement reduces measurement error caused by time dependent random delay variation. The proposed method has been implemented on a 65 nm low power process test chip. Variable frequency oscillator utilizes only standard cells and is designed with automatic layout flow without any timing tuning. The area overhead of the proposed method is 0.09% of the total random logic. The evaluation result show that 0.18% average measurement accuracy has been achieved.
随着半导体工艺小型化的发展,由老化引起的延迟退化日益严重,威胁着芯片的可靠性。已知延迟退化量与电路和工作负载有关,但以往的评估都是基于仿真,实际工作负载下真实电路的延迟退化测量尚未见报道。本文提出了一种真实的电路时延测量方法,该方法能够达到足够的精度来测量与电路和工作负载相关的时延退化。该方法利用片上振荡器为内部电路提供高分辨率的变频时钟。内部电路执行测试模式以激活不同频率的关键路径,并确定可以获得正确结果的最大频率。最大频率对应于测试模式激活的关键路径的延迟。时钟倍增提高了延迟分辨率,重复测量减少了由时间相关的随机延迟变化引起的测量误差。该方法已在65 nm低功耗制程测试芯片上实现。变频振荡器仅采用标准单元,设计具有自动布局流程,无需任何定时调谐。该方法的面积开销为总随机逻辑的0.09%。评价结果表明,平均测量精度达到0.18%。
{"title":"Real Circuit Delay Measurement Method by Variable Frequency Operation with On-Chip Fine Resolution Oscillator","authors":"K. Shimamura, Naohiro Ikeda","doi":"10.2197/ipsjtsldm.13.21","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.13.21","url":null,"abstract":"With the progress of semiconductor process miniaturization, delay degradation by aging increases and threatens the reliability of fabricated chips. The amount of delay degradation is known to be circuit and workload dependent, but previous evaluations are based on simulations, and delay degradation measurement of real circuit under realistic workload has not been reported yet. This paper proposes real circuit delay measurement method, which achieves enough accuracy to measure circuit and workload dependent delay degradation. In the proposed method, onchip oscillator supplies fine resolution variable frequency clock to internal circuit. Internal circuit execute test pattern to activate critical paths at various frequency and determine the maximum frequency at which correct results can be obtained. The maximum frequency corresponds to the delay of the critical paths activated by the test pattern. Clock multiplication improves delay resolution, and repetitive measurement reduces measurement error caused by time dependent random delay variation. The proposed method has been implemented on a 65 nm low power process test chip. Variable frequency oscillator utilizes only standard cells and is designed with automatic layout flow without any timing tuning. The area overhead of the proposed method is 0.09% of the total random logic. The evaluation result show that 0.18% average measurement accuracy has been achieved.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73915270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An FPGA Implementation Method based on Distributed-register Architectures 一种基于分布式寄存器结构的FPGA实现方法
Q4 Engineering Pub Date : 2019-02-01 DOI: 10.2197/ipsjtsldm.12.38
Koichi Fujiwara, Kazushi Kawamura, M. Yanagisawa, N. Togawa
{"title":"An FPGA Implementation Method based on Distributed-register Architectures","authors":"Koichi Fujiwara, Kazushi Kawamura, M. Yanagisawa, N. Togawa","doi":"10.2197/ipsjtsldm.12.38","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.38","url":null,"abstract":"","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86945903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Circuit Techniques for Device-Circuit Interaction toward Minimum Energy Operation 面向最小能量运行的器件-电路相互作用电路技术
Q4 Engineering Pub Date : 2019-02-01 DOI: 10.2197/ipsjtsldm.12.2
A. Islam, H. Onodera
{"title":"Circuit Techniques for Device-Circuit Interaction toward Minimum Energy Operation","authors":"A. Islam, H. Onodera","doi":"10.2197/ipsjtsldm.12.2","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.2","url":null,"abstract":"","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87699381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Parallelism-flexible Convolution Core for Sparse Convolutional Neural Networks on FPGA 基于FPGA的稀疏卷积神经网络并行柔性卷积核
Q4 Engineering Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.22
Salita Sombatsiri, S. Shibata, Yuki Kobayashi, Hiroaki Inoue, Takashi Takenaka, T. Hosomi, Jaehoon Yu, Yoshinori Takeuchi
This paper proposes a convolution core for sparse CNN that is capable of flexibly alternating the parallelism schemes and degree exploiting intraand inter-output parallelism of the convolutional layer, and leveraging weight sparsity using a compressed sparse model in the compressed sparse column format and output-stationary dataflow. The experimental results show that the performance is improved by 3.9 times even in the deeper layer where the conventional accelerator could not fully exploit the parallelism due to the small layer size. The proposed architecture could also exploit the weight sparsity. Then, by combining both the multi-parallelism and the weight sparsity, the proposed architecture achieved 5.2 times better performance than the conventional accelerator.
本文提出了一种稀疏CNN的卷积核,该卷积核能够灵活地交替并行方案和度,利用卷积层的输出内并行性和输出间并行性,并在压缩稀疏列格式和输出平稳数据流中使用压缩稀疏模型利用权稀疏性。实验结果表明,在传统加速器由于层数小而无法充分发挥并行性的情况下,即使在较深的层中,性能也提高了3.9倍。所提出的体系结构还可以利用权重稀疏性。然后,结合多重并行性和权值稀疏性,该架构的性能比传统加速器提高了5.2倍。
{"title":"Parallelism-flexible Convolution Core for Sparse Convolutional Neural Networks on FPGA","authors":"Salita Sombatsiri, S. Shibata, Yuki Kobayashi, Hiroaki Inoue, Takashi Takenaka, T. Hosomi, Jaehoon Yu, Yoshinori Takeuchi","doi":"10.2197/ipsjtsldm.12.22","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.22","url":null,"abstract":"This paper proposes a convolution core for sparse CNN that is capable of flexibly alternating the parallelism schemes and degree exploiting intraand inter-output parallelism of the convolutional layer, and leveraging weight sparsity using a compressed sparse model in the compressed sparse column format and output-stationary dataflow. The experimental results show that the performance is improved by 3.9 times even in the deeper layer where the conventional accelerator could not fully exploit the parallelism due to the small layer size. The proposed architecture could also exploit the weight sparsity. Then, by combining both the multi-parallelism and the weight sparsity, the proposed architecture achieved 5.2 times better performance than the conventional accelerator.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78625225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scalar Replacement with Circular Buffers 用循环缓冲区替换标量
Q4 Engineering Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.13
Kenshu Seto
Scalar replacement is one of effective array access optimizations that can be applied before High-level synthesis (HLS). The successful application of scalar replacement removes local memories, and as a result, it decreases hardware area. In addition, scalar replacement reduces the numbers of hardware execution cycles by reducing memory access conflicts. In scalar replacement, shift registers are introduced to remove local arrays, and reuse distances corresponds to the lengths of the shift registers. Previous scalar replacement methods implement the shift registers with chains of registers, so that the hardware area becomes large when the reuse distances are large. In addition, when reuse distances are unknown at compile time, previous scalar replacement methods require multiplexers with large numbers of inputs, which further increase on hardware area. In this paper, we propose a new technique to resolve the issues. In particular, we implement the shift registers with circular buffers instead of chains of registers. Large shift registers implemented by RAM-based circular buffers are more compact than those implemented by the chains of registers. We also show that the proposed method requires no multiplexers to realize scalar replacement for loops with statically unknown reuse distances, which leads to area-efficient hardware implementation. We developed a tool that implements the method and applied the tool to the benchmark programs which require large shift registers or have statically unknown reuse distances. We found that the hardware area is reduced with the proposed method compared to the previous method without sacrificing the hardware performance. We conclude that the proposed method is an area efficient scalar replacement method for programs that have large or unknown reuse distances at compile time.
标量替换是一种有效的阵列访问优化方法,可以在高级综合(High-level synthesis, HLS)之前应用。标量替换的成功应用消除了局部内存,从而减少了硬件面积。此外,标量替换通过减少内存访问冲突减少了硬件执行周期的数量。在标量替换中,引入移位寄存器来移除局部数组,重用距离对应于移位寄存器的长度。以往的标量替换方法是用寄存器链来实现移位寄存器,这样当复用距离大时,硬件面积就会变大。此外,在编译时复用距离未知的情况下,以往的标量替换方法需要大量输入的多路复用器,这进一步增加了硬件面积。在本文中,我们提出了一种新的技术来解决这些问题。特别是,我们用循环缓冲区来实现移位寄存器,而不是寄存器链。由基于ram的循环缓冲区实现的大移位寄存器比由寄存器链实现的寄存器更紧凑。我们还表明,所提出的方法不需要多路复用器来实现具有静态未知重用距离的循环的标量替换,从而导致面积高效的硬件实现。我们开发了一个实现该方法的工具,并将该工具应用于需要大移位寄存器或具有静态未知重用距离的基准程序。我们发现,与之前的方法相比,该方法在不牺牲硬件性能的情况下减少了硬件面积。我们的结论是,对于在编译时具有较大或未知的重用距离的程序,所提出的方法是一种面积有效的标量替换方法。
{"title":"Scalar Replacement with Circular Buffers","authors":"Kenshu Seto","doi":"10.2197/ipsjtsldm.12.13","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.13","url":null,"abstract":"Scalar replacement is one of effective array access optimizations that can be applied before High-level synthesis (HLS). The successful application of scalar replacement removes local memories, and as a result, it decreases hardware area. In addition, scalar replacement reduces the numbers of hardware execution cycles by reducing memory access conflicts. In scalar replacement, shift registers are introduced to remove local arrays, and reuse distances corresponds to the lengths of the shift registers. Previous scalar replacement methods implement the shift registers with chains of registers, so that the hardware area becomes large when the reuse distances are large. In addition, when reuse distances are unknown at compile time, previous scalar replacement methods require multiplexers with large numbers of inputs, which further increase on hardware area. In this paper, we propose a new technique to resolve the issues. In particular, we implement the shift registers with circular buffers instead of chains of registers. Large shift registers implemented by RAM-based circular buffers are more compact than those implemented by the chains of registers. We also show that the proposed method requires no multiplexers to realize scalar replacement for loops with statically unknown reuse distances, which leads to area-efficient hardware implementation. We developed a tool that implements the method and applied the tool to the benchmark programs which require large shift registers or have statically unknown reuse distances. We found that the hardware area is reduced with the proposed method compared to the previous method without sacrificing the hardware performance. We conclude that the proposed method is an area efficient scalar replacement method for programs that have large or unknown reuse distances at compile time.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82540316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An OpenCL-based Software Framework for a Heterogeneous Multicore Architecture on Zynq-7000 SoC 基于opencl的Zynq-7000 SoC异构多核架构软件框架
Q4 Engineering Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.46
T. Miyazaki, Shunsuke Takai, Ittetsu Taniguchi, H. Tomiyama
This paper presents an OpenCL-based software framework which we have developed for a heterogeneous multicore architecture on Zynq-7000 SoC. In this work, the heterogeneous architecture is designed with two hardmacro Cortex-A9 cores and two soft-macro MicroBlaze cores. A major advantage of our OpenCL framework is that it can execute OpenCL kernel programs in three ways. Experiments show the usefulness of the OpenCL framework.
本文介绍了一种基于opencl的软件框架,该框架是我们为Zynq-7000 SoC上的异构多核架构而开发的。本文采用两个硬宏Cortex-A9内核和两个软宏MicroBlaze内核设计了异构架构。我们的OpenCL框架的一个主要优点是它可以通过三种方式执行OpenCL内核程序。实验证明了OpenCL框架的有效性。
{"title":"An OpenCL-based Software Framework for a Heterogeneous Multicore Architecture on Zynq-7000 SoC","authors":"T. Miyazaki, Shunsuke Takai, Ittetsu Taniguchi, H. Tomiyama","doi":"10.2197/ipsjtsldm.12.46","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.46","url":null,"abstract":"This paper presents an OpenCL-based software framework which we have developed for a heterogeneous multicore architecture on Zynq-7000 SoC. In this work, the heterogeneous architecture is designed with two hardmacro Cortex-A9 cores and two soft-macro MicroBlaze cores. A major advantage of our OpenCL framework is that it can execute OpenCL kernel programs in three ways. Experiments show the usefulness of the OpenCL framework.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88776011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Neuromorphic Computing Systems: From CMOS To Emerging Nonvolatile Memory 神经形态计算系统:从CMOS到新兴的非易失性存储器
Q4 Engineering Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.53
Chaofei Yang, Ximing Qiao, Yiran Chen
: The end of Moore’s Law and von Neumann bottleneck motivate researchers to seek alternative architec- tures that can fulfill the increasing demand for computation resources which cannot be easily achieved by traditional computing paradigm. As one important practice, neuromorphic computing systems (NCS) are proposed to mimic bi- ological behaviors of neurons and synapses, and accelerate computation of neural networks. Traditional CMOS-based implementation of NCS, however, are subject to large hardware cost required to precisely replicate the biological prop- erties. In very recent decade, emerging nonvolatile memory (eNVM) was introduced to NCS design due to its high computing e ffi ciency and integration density. Similar to the circuits built on other nanoscale devices, eNVM-based NCS also su ff ers from many reliability issues. In this paper, we give a short survey about CMOS- and eNVM-based NCS, including their basic implementations and training and inference schemes in various applications. We also dis- cuss the design challenges of these NCS and introduce some techniques that can improve the reliability, precision, scalability, and security of the NCS. At the end, we provide our insights on the design trend and future challenges of the NCS.
摩尔定律和冯·诺依曼瓶颈的终结促使研究人员寻求替代架构,以满足对计算资源日益增长的需求,这是传统计算范式难以实现的。神经形态计算系统(neural morphic computing system, NCS)是模拟神经元和突触的双生物学行为,加速神经网络计算的重要方法之一。然而,传统的基于cmos的NCS实现需要大量的硬件成本来精确地复制生物特性。近十年来,新兴的非易失性存储器(eNVM)因其高计算效率和集成密度而被引入NCS设计。与建立在其他纳米级器件上的电路类似,基于envm的NCS也存在许多可靠性问题。本文简要介绍了基于CMOS和envm的NCS,包括它们的基本实现以及在各种应用中的训练和推理方案。我们还讨论了这些网络控制系统的设计挑战,并介绍了一些可以提高网络控制系统的可靠性、精度、可扩展性和安全性的技术。最后,我们对NCS的设计趋势和未来挑战提出了自己的见解。
{"title":"Neuromorphic Computing Systems: From CMOS To Emerging Nonvolatile Memory","authors":"Chaofei Yang, Ximing Qiao, Yiran Chen","doi":"10.2197/ipsjtsldm.12.53","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.53","url":null,"abstract":": The end of Moore’s Law and von Neumann bottleneck motivate researchers to seek alternative architec- tures that can fulfill the increasing demand for computation resources which cannot be easily achieved by traditional computing paradigm. As one important practice, neuromorphic computing systems (NCS) are proposed to mimic bi- ological behaviors of neurons and synapses, and accelerate computation of neural networks. Traditional CMOS-based implementation of NCS, however, are subject to large hardware cost required to precisely replicate the biological prop- erties. In very recent decade, emerging nonvolatile memory (eNVM) was introduced to NCS design due to its high computing e ffi ciency and integration density. Similar to the circuits built on other nanoscale devices, eNVM-based NCS also su ff ers from many reliability issues. In this paper, we give a short survey about CMOS- and eNVM-based NCS, including their basic implementations and training and inference schemes in various applications. We also dis- cuss the design challenges of these NCS and introduce some techniques that can improve the reliability, precision, scalability, and security of the NCS. At the end, we provide our insights on the design trend and future challenges of the NCS.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84303031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Genetic Algorithm for Scheduling of Data-parallel Tasks on Multicore Architectures 多核架构下数据并行任务调度的遗传算法
Q4 Engineering Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.74
Yang Liu, Lin Meng, H. Tomiyama
: This paper proposes a genetic algorithm for scheduling of multiple data-parallel tasks on multicores. Un- like traditional task scheduling, this work allows individual tasks to run on multiple cores in a data-parallel fashion. Experimental results show the e ff ectiveness of the proposed algorithm over state-of-the-art
提出了一种用于多核数据并行任务调度的遗传算法。与传统的任务调度不同,这项工作允许单个任务以数据并行的方式在多个核心上运行。实验结果表明,该算法比现有算法更有效
{"title":"A Genetic Algorithm for Scheduling of Data-parallel Tasks on Multicore Architectures","authors":"Yang Liu, Lin Meng, H. Tomiyama","doi":"10.2197/ipsjtsldm.12.74","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.74","url":null,"abstract":": This paper proposes a genetic algorithm for scheduling of multiple data-parallel tasks on multicores. Un- like traditional task scheduling, this work allows individual tasks to run on multiple cores in a data-parallel fashion. Experimental results show the e ff ectiveness of the proposed algorithm over state-of-the-art","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83670586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
IPSJ Transactions on System LSI Design Methodology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1