2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

英文中文

Welcome from the MCSoC 2021 Chairs 欢迎来自2021年MCSoC的主席们

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Pub Date : 2021-12-01 DOI: 10.1109/mcsoc51149.2021.00005

引用次数: 0

FPGA based Adaptive Hardware Acceleration for Multiple Deep Learning Tasks 基于FPGA的多深度学习任务自适应硬件加速

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00038

Yufan Lu, X. Zhai, S. Saha, Shoaib Ehsan, K. Mcdonald-Maier

Machine learning, and in particular deep learning (DL), has seen strong success in a wide variety of applications, e.g. object detection, image classification and self-driving. However, due to the limitations on hardware resources and power consumption, there are many challenges to deploy deep learning algorithms on resource-constrained mobile and embedded systems, especially for systems running multiple DL algorithms for a variety of tasks. In this paper, an adaptive hardware resource management system, implemented on field-programmable gate arrays (FPGAs), is proposed to dynamically manage the on-chip hardware resources (e.g. LUTs, BRAMs and DSPs) to adapt to a variety of tasks. Using dynamic function exchange (DFX) technology, the system can dynamically allocate hardware resources to deploy deep learning units (DPUs) so as to balance the requirements, performance and power consumption of the deep learning applications. The prototype is implemented on the Xilinx Zynq UltraScale+ series chips. The experiment results indicate that the proposed scheme significantly improves the computing efficiency of the resource-constrained systems under various experimental scenarios. Compared to the baseline, the proposed strategy consumes 38% and 82% of power in low working load cases and high working load cases, respectively. Typically, the proposed system can save approximately 75.8% of energy.

机器学习，特别是深度学习(DL)，已经在各种各样的应用中取得了巨大的成功，例如物体检测、图像分类和自动驾驶。然而，由于硬件资源和功耗的限制，在资源受限的移动和嵌入式系统上部署深度学习算法存在许多挑战，特别是对于运行多种深度学习算法以执行各种任务的系统。本文提出了一种基于现场可编程门阵列(fpga)的自适应硬件资源管理系统，用于动态管理片上硬件资源(如lut、bram和dsp)，以适应各种任务。通过DFX (dynamic function exchange)技术，系统可以动态分配硬件资源来部署深度学习单元(dpu)，从而平衡深度学习应用的需求、性能和功耗。该原型在赛灵思Zynq UltraScale+系列芯片上实现。实验结果表明，在各种实验场景下，该方案显著提高了资源受限系统的计算效率。与基线相比，该策略在低工作负载和高工作负载情况下的功耗分别为38%和82%。通常情况下，该系统可以节省约75.8%的能源。

{"title":"FPGA based Adaptive Hardware Acceleration for Multiple Deep Learning Tasks","authors":"Yufan Lu, X. Zhai, S. Saha, Shoaib Ehsan, K. Mcdonald-Maier","doi":"10.1109/MCSoC51149.2021.00038","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00038","url":null,"abstract":"Machine learning, and in particular deep learning (DL), has seen strong success in a wide variety of applications, e.g. object detection, image classification and self-driving. However, due to the limitations on hardware resources and power consumption, there are many challenges to deploy deep learning algorithms on resource-constrained mobile and embedded systems, especially for systems running multiple DL algorithms for a variety of tasks. In this paper, an adaptive hardware resource management system, implemented on field-programmable gate arrays (FPGAs), is proposed to dynamically manage the on-chip hardware resources (e.g. LUTs, BRAMs and DSPs) to adapt to a variety of tasks. Using dynamic function exchange (DFX) technology, the system can dynamically allocate hardware resources to deploy deep learning units (DPUs) so as to balance the requirements, performance and power consumption of the deep learning applications. The prototype is implemented on the Xilinx Zynq UltraScale+ series chips. The experiment results indicate that the proposed scheme significantly improves the computing efficiency of the resource-constrained systems under various experimental scenarios. Compared to the baseline, the proposed strategy consumes 38% and 82% of power in low working load cases and high working load cases, respectively. Typically, the proposed system can save approximately 75.8% of energy.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131792740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

RELAX: a REconfigurabLe Approximate Network-on-Chip 放松:一个可重构的近似片上网络

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00063

Richard Fenster, S. L. Beux

The high error-resilience of numerous applications such as neural networks and signal processing led to new optimization opportunities in manycore systems. Indeed, approximate computing enable the reduction of data bit size, which allows to relax design constraints of computing resources and memory. However, on-chip interconnects can hardly take advantage of the reduced data size since they also need to transmit plain sized data. Consequently, existing approximate networks-on-chip (NoCs) either involve additional physical layers dedicated to approximate data or significantly increase the energy to transfer non-approximate data. To solve this challenge, we propose RELAX, a reconfigurable network-on-chip that can operate in an accurate data only mode or a mixed mode. The mixed mode allows for concurrent accurate and approximate data transactions using the same physical layer, hence allowing the efficient transmission of approximate data while reducing the resources overhead. Synthesis and simulation results show that RELAX improves communication latency of approximate data up to 44.2% when compared to an accurate data only, baseline 2D-Mesh NoC.

神经网络和信号处理等众多应用的高容错性为多核心系统带来了新的优化机会。实际上，近似计算可以减少数据位大小，从而可以放松计算资源和内存的设计约束。然而，片上互连很难利用减少的数据大小，因为它们还需要传输普通大小的数据。因此，现有的近似片上网络(noc)要么涉及专门用于近似数据的额外物理层，要么显著增加传输非近似数据的能量。为了解决这一挑战，我们提出了RELAX，这是一种可重构的片上网络，可以在精确数据模式或混合模式下运行。混合模式允许使用同一物理层并发准确和近似数据事务，因此允许在减少资源开销的同时有效地传输近似数据。综合和仿真结果表明，与仅使用精确数据的基线2D-Mesh NoC相比，RELAX可将近似数据的通信延迟提高44.2%。

{"title":"RELAX: a REconfigurabLe Approximate Network-on-Chip","authors":"Richard Fenster, S. L. Beux","doi":"10.1109/MCSoC51149.2021.00063","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00063","url":null,"abstract":"The high error-resilience of numerous applications such as neural networks and signal processing led to new optimization opportunities in manycore systems. Indeed, approximate computing enable the reduction of data bit size, which allows to relax design constraints of computing resources and memory. However, on-chip interconnects can hardly take advantage of the reduced data size since they also need to transmit plain sized data. Consequently, existing approximate networks-on-chip (NoCs) either involve additional physical layers dedicated to approximate data or significantly increase the energy to transfer non-approximate data. To solve this challenge, we propose RELAX, a reconfigurable network-on-chip that can operate in an accurate data only mode or a mixed mode. The mixed mode allows for concurrent accurate and approximate data transactions using the same physical layer, hence allowing the efficient transmission of approximate data while reducing the resources overhead. Synthesis and simulation results show that RELAX improves communication latency of approximate data up to 44.2% when compared to an accurate data only, baseline 2D-Mesh NoC.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130737349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Boosting CPU Performance using Pipelined Branch and Jump Folding Hardware with Turbo Module 使用管道分支和跳跃折叠硬件与Turbo模块提高CPU性能

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00060

Mong Tee Sim

The new generation of embedded applications demands both high performance and energy efficiency. This paper presents a new hardware design to support architecture-level thread isolation, together with logics to fold the branch and jump instructions and a Turbo module, thereby reducing the overall number of instructions flowing through the CPU without causing any pipeline stalls. By pipelining the branch and jump folding logics from multiple threads of execution, the hardware can continuously operate at the peak CPU speed, with reduced power consumption by reducing the number of microcontrollers required in the system. We show that this novel technique can accelerate the system performance, increase the instruction per cycle up to 1.36, and with the Turbo module, up to 1.823, without requiring any extra programming effort by developers. We used the Dhrystone, Coremark, and ten selected benchmark metrics to validate the performance and functionality of our system.

新一代嵌入式应用要求高性能和节能。本文提出了一种新的硬件设计，以支持架构级线程隔离，以及折叠分支和跳转指令的逻辑和Turbo模块，从而减少了流经CPU的指令总数，而不会造成任何管道停滞。通过将分支和跳转折叠逻辑从多个执行线程中流水线化，硬件可以连续地以最高CPU速度运行，通过减少系统中所需的微控制器数量来降低功耗。我们证明了这种新技术可以加速系统性能，将每周期的指令增加到1.36，并且在Turbo模块中增加到1.823，而不需要开发人员进行任何额外的编程工作。我们使用Dhrystone、Coremark和十个选定的基准度量来验证系统的性能和功能。

引用次数: 0

FPGA-Based Implementation of the Stereo Matching Algorithm Using High-Level Synthesis 基于fpga的立体匹配高级合成算法实现

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00009

Iman Firmansyah, Y. Yamaguchi

Stereo vision finds a wide range of applications in automotive, object detection, robot navigation, agriculture mapping, and others. Stereo matching is a stereo algorithm targeted to identify the corresponding pixels from two or more images. This study shows the implementation of stereo matching using the Sum of Absolute Difference (SAD) algorithm to extract the object's depth or disparity from stereo images. Our key objective revolves around the implementation of the stereo matching algorithm for a small field-programmable gate array (FPGA)—requiring relatively few resources while maintaining the processing speed as well as disparity map. For meeting this requirement, we used small window buffers to compute the stereo matching. The occluded pixels were reduced by introducing secondary consistency checking implementation. From the results of experiments performed using the Zynq UltraScale+ ZCU102 FPGA board with SDSoC compiler, the processing speed for computing the stereo matching algorithm with a 4×4 window buffer was 0.038 s for an image size of 486×720 pixels and 0.051 s for 375×1242 pixels resolution. The proposed design needed 1% each of BRAM and FF and 7% of LUT. An 18% reduction in the pixel errors has been observed when employing the secondary consistency matching on the post-processing.

立体视觉在汽车、物体检测、机器人导航、农业制图等领域有着广泛的应用。立体匹配是一种旨在从两个或多个图像中识别相应像素的立体算法。本研究展示了使用绝对差和(SAD)算法实现立体匹配，从立体图像中提取物体的深度或视差。我们的主要目标是围绕小型现场可编程门阵列(FPGA)的立体匹配算法的实现-在保持处理速度和视差图的同时需要相对较少的资源。为了满足这一要求，我们使用小窗口缓冲来计算立体匹配。通过引入二次一致性检查实现来减少被遮挡的像素。使用SDSoC编译器和Zynq UltraScale+ ZCU102 FPGA板进行的实验结果表明，当图像大小为486×720像素时，使用4×4窗口缓冲计算立体匹配算法的处理速度为0.038 s，当图像分辨率为375×1242像素时，处理速度为0.051 s。所提出的设计需要BRAM和FF各1%，LUT各7%。在后处理中采用二次一致性匹配时，观察到像素误差降低了18%。

{"title":"FPGA-Based Implementation of the Stereo Matching Algorithm Using High-Level Synthesis","authors":"Iman Firmansyah, Y. Yamaguchi","doi":"10.1109/MCSoC51149.2021.00009","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00009","url":null,"abstract":"Stereo vision finds a wide range of applications in automotive, object detection, robot navigation, agriculture mapping, and others. Stereo matching is a stereo algorithm targeted to identify the corresponding pixels from two or more images. This study shows the implementation of stereo matching using the Sum of Absolute Difference (SAD) algorithm to extract the object's depth or disparity from stereo images. Our key objective revolves around the implementation of the stereo matching algorithm for a small field-programmable gate array (FPGA)—requiring relatively few resources while maintaining the processing speed as well as disparity map. For meeting this requirement, we used small window buffers to compute the stereo matching. The occluded pixels were reduced by introducing secondary consistency checking implementation. From the results of experiments performed using the Zynq UltraScale+ ZCU102 FPGA board with SDSoC compiler, the processing speed for computing the stereo matching algorithm with a 4×4 window buffer was 0.038 s for an image size of 486×720 pixels and 0.051 s for 375×1242 pixels resolution. The proposed design needed 1% each of BRAM and FF and 7% of LUT. An 18% reduction in the pixel errors has been observed when employing the secondary consistency matching on the post-processing.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114415418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Computation-Aware TPL Utilization Procedure for Parallelizing the FastICA Algorithm on a Multi-Core CPU 一种在多核CPU上并行化FastICA算法的计算感知TPL利用程序

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00033

Lan-Da Van, Tao Wang, Sing-Jia Tzeng, T. Jung

Independent Component Analysis is a widely used machine learning technique to separate mixed signals into statistically independent components. This study proposes a computation-aware (CA) Task Parallel Library (TPL) utilization procedure to parallelize the Fast Independent Component Analysis (FastICA) algorithm on a multi-core CPU. The proposed CA method separates the complex from simple computations by exploring their execution times on a multi-core CPU. TPL is used for complex calculations, but not for simple ones. In comparison to the program without the TPL, the proposed CA procedure reduces the execution time of decomposing 8- and 32-channel artificially mixed signals by 34.88% and 43.01%, respectively. The proposed CA procedure reduces the execution time of decomposing 8- and 32-channel artificially mixed signals by 10.04% and 0.93%, respectively, compared to the fully parallelized program with TPL. Using CA TPL, the decomposition of 12-channel electroencephalograms (EEG) signals take 48.27% less time than without it. The proposed CA procedure reduces execution time by 15.12% compared to the fully parallelized program with TPL.

独立分量分析是一种广泛使用的机器学习技术，用于将混合信号分离成统计上独立的分量。本研究提出一种计算感知(CA)任务并行库(TPL)利用程序，在多核CPU上并行化快速独立成分分析(FastICA)算法。提出的CA方法通过在多核CPU上研究复杂计算的执行时间，将复杂计算与简单计算分离开来。TPL用于复杂的计算，但不用于简单的计算。与没有TPL的程序相比，本文提出的CA程序分解8路和32路人为混合信号的执行时间分别减少了34.88%和43.01%。与完全并行化的TPL程序相比，所提出的CA程序分解8路和32路人为混合信号的执行时间分别减少了10.04%和0.93%。使用CA TPL对12通道脑电图信号进行分解，分解时间比不使用CA TPL减少48.27%。与使用TPL的完全并行程序相比，所提出的CA过程的执行时间减少了15.12%。

{"title":"A Computation-Aware TPL Utilization Procedure for Parallelizing the FastICA Algorithm on a Multi-Core CPU","authors":"Lan-Da Van, Tao Wang, Sing-Jia Tzeng, T. Jung","doi":"10.1109/MCSoC51149.2021.00033","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00033","url":null,"abstract":"Independent Component Analysis is a widely used machine learning technique to separate mixed signals into statistically independent components. This study proposes a computation-aware (CA) Task Parallel Library (TPL) utilization procedure to parallelize the Fast Independent Component Analysis (FastICA) algorithm on a multi-core CPU. The proposed CA method separates the complex from simple computations by exploring their execution times on a multi-core CPU. TPL is used for complex calculations, but not for simple ones. In comparison to the program without the TPL, the proposed CA procedure reduces the execution time of decomposing 8- and 32-channel artificially mixed signals by 34.88% and 43.01%, respectively. The proposed CA procedure reduces the execution time of decomposing 8- and 32-channel artificially mixed signals by 10.04% and 0.93%, respectively, compared to the fully parallelized program with TPL. Using CA TPL, the decomposition of 12-channel electroencephalograms (EEG) signals take 48.27% less time than without it. The proposed CA procedure reduces execution time by 15.12% compared to the fully parallelized program with TPL.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128868212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LUSH: Lightweight Framework for User-level Scheduling in Heterogeneous Multicores LUSH:异构多核中用户级调度的轻量级框架

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00065

Vasco Xu, Liam White McShane, D. Mossé

As heterogeneous multicore systems become a standard in computing devices, there is an increasing need for intelligent and adaptive resource allocation schemes to achieve a balance between performance and energy consumption. To support this growing need, researchers have explored a plethora of techniques to guide OS scheduling policies, including machine learning, statistical regression and custom heuristics. Such techniques have been enabled by the abundance of low-level performance counters, and have proven effective in characterizing applications as well as predicting power and performance. However, most works require and develop custom infrastructures. In this paper we present LUSH, a Lightweight Framework for User-level Scheduling in Heterogeneous Multicores that allows for users to develop their own customized scheduling policies, without requiring root privileges. LUSH contributes the following to the state-of-the-art: (1) a mechanism for monitoring application runtime behavior using performance counters, (2) a mechanism for exporting kernel data to user-level at a user-defined period; and (3) a parameterized and flexible interface for developing, deploying, and evaluating novel algorithms applied to OS scheduling policies. The framework presented in this paper serves as a foundation for exploring advanced and intelligent techniques for resource management in heterogeneous systems.

随着异构多核系统成为计算设备的标准，越来越需要智能和自适应的资源分配方案来实现性能和能耗之间的平衡。为了支持这种不断增长的需求，研究人员已经探索了大量的技术来指导操作系统调度策略，包括机器学习、统计回归和自定义启发式。大量的低级性能计数器支持这种技术，并且在描述应用程序以及预测功耗和性能方面已被证明是有效的。然而，大多数工作需要并开发定制的基础设施。在本文中，我们提出了LUSH，一个用于异构多核中用户级调度的轻量级框架，允许用户开发他们自己的自定义调度策略，而不需要root权限。LUSH为最先进的技术做出了以下贡献:(1)使用性能计数器监视应用程序运行时行为的机制;(2)在用户定义的时间段内将内核数据导出到用户级别的机制;(3)用于开发、部署和评估应用于操作系统调度策略的新算法的参数化和灵活的接口。本文提出的框架为探索异构系统中先进的智能资源管理技术奠定了基础。

{"title":"LUSH: Lightweight Framework for User-level Scheduling in Heterogeneous Multicores","authors":"Vasco Xu, Liam White McShane, D. Mossé","doi":"10.1109/MCSoC51149.2021.00065","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00065","url":null,"abstract":"As heterogeneous multicore systems become a standard in computing devices, there is an increasing need for intelligent and adaptive resource allocation schemes to achieve a balance between performance and energy consumption. To support this growing need, researchers have explored a plethora of techniques to guide OS scheduling policies, including machine learning, statistical regression and custom heuristics. Such techniques have been enabled by the abundance of low-level performance counters, and have proven effective in characterizing applications as well as predicting power and performance. However, most works require and develop custom infrastructures. In this paper we present LUSH, a Lightweight Framework for User-level Scheduling in Heterogeneous Multicores that allows for users to develop their own customized scheduling policies, without requiring root privileges. LUSH contributes the following to the state-of-the-art: (1) a mechanism for monitoring application runtime behavior using performance counters, (2) a mechanism for exporting kernel data to user-level at a user-defined period; and (3) a parameterized and flexible interface for developing, deploying, and evaluating novel algorithms applied to OS scheduling policies. The framework presented in this paper serves as a foundation for exploring advanced and intelligent techniques for resource management in heterogeneous systems.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126484853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Network Simulator for the Estimation of Bandwidth Load and Latency Created by Heterogeneous Spiking Neural Networks on Neuromorphic Computing Communication Networks 神经形态计算通信网络中异构尖峰神经网络产生的带宽负载和延迟估计网络模拟器

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00054

R. Kleijnen, M. Robens, M. Schiek, S. Waasen

Observing long-term learning effects caused by neuron activity in the human brain in vivo, over a period of weeks, months, or years, is impractical. Over the last decade, the field of neuromorphic computing hardware has grown significantly, i.e. SpiNNaker, BrainScaleS and Neurogrid. These novel many-core simulation platforms offer a practical alternative to study neuron behaviour in the brain at an accelerated rate, with a high level of detail. However, they do by far not reach human brain scales yet as in particular the massive amount of spike communication turns out to be a bottleneck. In this paper, we introduce a network simulator specifically developed for the analysis of bandwidth load and latency of different network topologies and communication protocols in neuromorphic computing communication networks in high detail. Unique to this simulator, compared to state of the art network models and simulators, is its ability to simulate the impact of heterogeneous neural connectivity by different models as well as the evaluation of neuron mapping algorithms. We crosscheck the simulator by comparing the results of a run using a homogeneous neural network to the bandwidth load resulting from comparable works, but simultaneously show the increased level of detail reached with our simulator. Finally, we show the impact heterogeneous connectivity can have on the bandwidth and how different neuron mapping algorithms can enhance this effect.

观察人体大脑中神经元活动在数周、数月或数年内引起的长期学习效果是不切实际的。在过去的十年中，神经形态计算硬件领域有了显著的发展，如SpiNNaker、BrainScaleS和Neurogrid。这些新颖的多核模拟平台提供了一种实用的替代方案，以加速研究大脑中的神经元行为，具有高水平的细节。然而，到目前为止，它们还没有达到人脑的规模，特别是大量的尖峰通信成为瓶颈。本文介绍了一种网络模拟器，专门用于分析神经形态计算通信网络中不同网络拓扑和通信协议的带宽负载和延迟。与最先进的网络模型和模拟器相比，这个模拟器的独特之处在于，它能够通过不同的模型模拟异构神经连接的影响，以及神经元映射算法的评估。我们通过将使用同质神经网络的运行结果与同类工作产生的带宽负载进行比较来交叉检查模拟器，但同时显示了我们的模拟器所达到的细节水平的提高。最后，我们展示了异构连接对带宽的影响，以及不同的神经元映射算法如何增强这种影响。

{"title":"A Network Simulator for the Estimation of Bandwidth Load and Latency Created by Heterogeneous Spiking Neural Networks on Neuromorphic Computing Communication Networks","authors":"R. Kleijnen, M. Robens, M. Schiek, S. Waasen","doi":"10.1109/MCSoC51149.2021.00054","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00054","url":null,"abstract":"Observing long-term learning effects caused by neuron activity in the human brain in vivo, over a period of weeks, months, or years, is impractical. Over the last decade, the field of neuromorphic computing hardware has grown significantly, i.e. SpiNNaker, BrainScaleS and Neurogrid. These novel many-core simulation platforms offer a practical alternative to study neuron behaviour in the brain at an accelerated rate, with a high level of detail. However, they do by far not reach human brain scales yet as in particular the massive amount of spike communication turns out to be a bottleneck. In this paper, we introduce a network simulator specifically developed for the analysis of bandwidth load and latency of different network topologies and communication protocols in neuromorphic computing communication networks in high detail. Unique to this simulator, compared to state of the art network models and simulators, is its ability to simulate the impact of heterogeneous neural connectivity by different models as well as the evaluation of neuron mapping algorithms. We crosscheck the simulator by comparing the results of a run using a homogeneous neural network to the bandwidth load resulting from comparable works, but simultaneously show the increased level of detail reached with our simulator. Finally, we show the impact heterogeneous connectivity can have on the bandwidth and how different neuron mapping algorithms can enhance this effect.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128923649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

EEG-based Positive-Negative Emotion Classification Using Machine Learning Techniques 利用机器学习技术进行基于脑电图的正负情绪分类

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00027

Yuta Kasuga, Jungpil Shin, Md. Al Mehedi Hasan, Y. Okuyama, Yoichi Tomioka

The aim of this study is to find useful electrodes for positive-negative emotion classification based on EEG. We collected EEG signals from 30 people aged 19-38 using 14 electrodes. We used two movies for positive and negative emotions. First, we extracted the power spectrum from the EEG data, normalized the data, and extracted frequency-domain statistical parameters therefrom. When the features were applied to Random Forests (RF), 85.4%, 83.8%, and 83.4% accuracy was obtained for P8, P7, and FC6 electrodes, respectively. This indicates that the P8, P7 and FC6 electrodes are the useful electrode in positive-negative emotion classification.

本研究的目的是寻找有效的电极，用于基于脑电图的正负情绪分类。我们用14个电极收集了30个19-38岁的人的脑电图信号。我们用两部电影来表达积极和消极的情绪。首先从脑电数据中提取功率谱，对数据进行归一化处理，提取频域统计参数;当特征应用于随机森林(RF)时，P8、P7和FC6电极的准确率分别为85.4%、83.8%和83.4%。这表明P8、P7和FC6电极是积极-消极情绪分类的有用电极。

引用次数: 1

An Intelligent Plant Dissease Detection System for Smart Hydroponic Using Convolutional Neural Network 基于卷积神经网络的智能水培植物病害检测系统

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00058

Aminu Musa, Mohamed Hamada, F. Aliyu, Mohammed Hassan

Recently, researchers proposed automation of hydroponic systems to improve efficiency and minimize manpower requirements. Thus increasing profit and farm produce. However, a fully automated hydroponic system should be able to identify cases such as plant diseases, lack of nutrients, and inadequate water supply. Failure to detect these issues can lead to damage of crops and loss of capital. This paper presents an Internet of Things-based machine learning system for plant disease detection using Deep Convolutional Neural Network (DCNN). The model was trained on a data set of 54,309 instances containing 38 different classes of plant disease. The images were retrieved from a plant village database. The system achieved an Accuracy of 98.0% and AUC precision score of 88.0%.

最近，研究人员提出了水培系统的自动化，以提高效率和减少人力需求。从而增加利润和农产品。然而，一个完全自动化的水培系统应该能够识别诸如植物病害、营养缺乏和供水不足等情况。未能发现这些问题可能会导致作物受损和资本损失。提出了一种基于物联网的基于深度卷积神经网络(DCNN)的植物病害检测机器学习系统。该模型是在包含38种不同类型植物病害的54,309个实例的数据集上训练的。这些图像是从植物村数据库中检索的。该系统的准确率为98.0%，AUC精度分数为88.0%。

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀