首页 > 最新文献

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

英文 中文
Mini-Batch Training along Convolution Windows for Representation Learning Based on Spike-Time-Dependent-Plasticity Rule 基于Spike-Time-Dependent-Plasticity Rule的小批量卷积窗表示学习训练
Yohei Shimmyo, Y. Okuyama
This paper presents a mini-batch training methodology along convolutional windows for layer-wised STDP unsupervised training on convolutional layers in order to shorten the training time of spiking neural networks (SNNs). SNN is a third-generation neural network that uses an accurate neuron model compared to rate-coded models used in conventional artificial neural networks (ANNs). The mini-batches of input convolution windows are convoluted at once. Then, the input, output, and current filter generate a batch of weight updates at once. This system reduces overheads of library calls or GPU execution. The batch processing methodology leads more significant and extensive models to be trained in ANNs, while many evaluations of direct SNN training methodologies are limited to smaller models. Currently, training large-scale models is virtually impossible. We evaluated the mini-batch processing effect on training speed and feature extraction power against various mini-batch sizes. The result showed that a larger mini-batch size enables us to utilize GPUs effectively, maintaining comparable feature extraction power. This research concludes that mini-batch training along convolution windows reduces training time by STDP training rule.
为了缩短尖峰神经网络(snn)的训练时间,提出了一种基于卷积窗口的分层STDP无监督训练方法。SNN是第三代神经网络,与传统人工神经网络(ann)中使用的速率编码模型相比,它使用了精确的神经元模型。小批量的输入卷积窗口一次被卷积。然后,输入、输出和当前过滤器立即生成一批权重更新。该系统减少了库调用或GPU执行的开销。批处理方法导致在人工神经网络中训练更重要和广泛的模型,而许多直接SNN训练方法的评估仅限于较小的模型。目前,训练大规模模型几乎是不可能的。我们针对不同的小批大小评估了小批处理对训练速度和特征提取能力的影响。结果表明,更大的小批大小使我们能够有效地利用gpu,保持相当的特征提取能力。研究表明,利用STDP训练规则,沿卷积窗进行小批量训练可以减少训练时间。
{"title":"Mini-Batch Training along Convolution Windows for Representation Learning Based on Spike-Time-Dependent-Plasticity Rule","authors":"Yohei Shimmyo, Y. Okuyama","doi":"10.1109/MCSoC51149.2021.00052","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00052","url":null,"abstract":"This paper presents a mini-batch training methodology along convolutional windows for layer-wised STDP unsupervised training on convolutional layers in order to shorten the training time of spiking neural networks (SNNs). SNN is a third-generation neural network that uses an accurate neuron model compared to rate-coded models used in conventional artificial neural networks (ANNs). The mini-batches of input convolution windows are convoluted at once. Then, the input, output, and current filter generate a batch of weight updates at once. This system reduces overheads of library calls or GPU execution. The batch processing methodology leads more significant and extensive models to be trained in ANNs, while many evaluations of direct SNN training methodologies are limited to smaller models. Currently, training large-scale models is virtually impossible. We evaluated the mini-batch processing effect on training speed and feature extraction power against various mini-batch sizes. The result showed that a larger mini-batch size enables us to utilize GPUs effectively, maintaining comparable feature extraction power. This research concludes that mini-batch training along convolution windows reduces training time by STDP training rule.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126180897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2QoSM: A Q-Learner QoS Manager for Application-Guided Power-Aware Systems qosm:面向应用导向的功率感知系统的Q-Learner QoS管理器
Michael J. Giardino, D. Schwyn, Bonnie H. Ferri, A. Ferri
This paper describes the design and performance of Q-learning-based quality-of-service manager (2QoSM) for compute-aware applications (CAAs) as part of platform-agnostic resource management framework. CAAs and hardware are able to share metrics of performance with the 2QoSM and the 2QoSM can attempt to reconfigure CAAs and hardware to meet performance targets. This enables many co-design benefits while allowing for policy and platform portability. The use of Q-Learning allows online generation of the power management policy without requiring details about system state or actions, and can meet different goals including error, power minimization, or a combination of both. 2QoSM, evaluated using an embedded MCSoC controlling a mobile robot, reduces power compared to the Linux on-demand governor by 38.7-42.6% and a situation-aware governor by 4.0-10.2%. An error-minimization policy obtained a reduction in path-following error of 4.6-8.9%.
本文描述了计算感知应用(CAAs)中基于q学习的服务质量管理器(2QoSM)的设计和性能,并将其作为平台无关的资源管理框架的一部分。CAAs和硬件能够与2QoSM共享性能指标,2QoSM可以尝试重新配置CAAs和硬件以满足性能目标。这在允许策略和平台可移植性的同时实现了许多协同设计的好处。Q-Learning的使用允许在线生成电源管理策略,而不需要关于系统状态或操作的详细信息,并且可以满足不同的目标,包括错误,功率最小化或两者的组合。2QoSM使用嵌入式MCSoC控制移动机器人进行评估,与Linux按需调控器相比,功耗降低38.7-42.6%,与态势感知调控器相比,功耗降低4.0-10.2%。错误最小化策略使路径跟踪错误减少了4.6-8.9%。
{"title":"2QoSM: A Q-Learner QoS Manager for Application-Guided Power-Aware Systems","authors":"Michael J. Giardino, D. Schwyn, Bonnie H. Ferri, A. Ferri","doi":"10.1109/MCSoC51149.2021.00040","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00040","url":null,"abstract":"This paper describes the design and performance of Q-learning-based quality-of-service manager (2QoSM) for compute-aware applications (CAAs) as part of platform-agnostic resource management framework. CAAs and hardware are able to share metrics of performance with the 2QoSM and the 2QoSM can attempt to reconfigure CAAs and hardware to meet performance targets. This enables many co-design benefits while allowing for policy and platform portability. The use of Q-Learning allows online generation of the power management policy without requiring details about system state or actions, and can meet different goals including error, power minimization, or a combination of both. 2QoSM, evaluated using an embedded MCSoC controlling a mobile robot, reduces power compared to the Linux on-demand governor by 38.7-42.6% and a situation-aware governor by 4.0-10.2%. An error-minimization policy obtained a reduction in path-following error of 4.6-8.9%.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123587724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scheduling DAGs of Multi-Version Multi-Phase Tasks on Heterogeneous Real-Time Systems 异构实时系统中多版本多阶段任务的调度dag
Julius Roeder, Benjamin Rouxel, C. Grelck
Heterogeneous high performance embedded systems are increasingly used in industry. Nowadays, these platforms embed accelerator-style components, such as GPUs, alongside different CPU cores. We use multiple alternatives/versions/implementations of tasks to fully benefit from the heterogeneous capacities of such platforms and due to binary incompatibility. Implementations targeting accelerators not only require access to the accelerator but also to a CPU core for, e.g., pre-processing and branching the control flow. Hence, accelerator workloads can naturally be divided into multiple phases (e.g. CPU, GPU, CPU). We propose an asynchronous scheduling approach that utilises multiple phases and thereby enables a finegrained scheduling of tasks that require two types of hardware. We show that our approach can increase the schedulability rate by up 24% over two multi-version phase-unaware schedulers. Additionally, we demonstrate that the schedulability rate of our heuristic is close to the optimal schedulability rate.
异构高性能嵌入式系统在工业上的应用越来越广泛。如今,这些平台嵌入了加速器风格的组件,如gpu,以及不同的CPU内核。我们使用多个替代方案/版本/任务实现,以充分受益于这些平台的异构能力,并且由于二进制不兼容性。针对加速器的实现不仅需要访问加速器,还需要访问CPU核心,例如,预处理和分支控制流。因此,加速器工作负载可以自然地划分为多个阶段(例如CPU、GPU、CPU)。我们提出了一种异步调度方法,它利用多个阶段,从而能够对需要两种类型硬件的任务进行细粒度调度。我们表明,与两个多版本无阶段调度程序相比,我们的方法可以将可调度率提高24%。此外,我们还证明了启发式算法的可调度率接近于最优可调度率。
{"title":"Scheduling DAGs of Multi-Version Multi-Phase Tasks on Heterogeneous Real-Time Systems","authors":"Julius Roeder, Benjamin Rouxel, C. Grelck","doi":"10.1109/MCSoC51149.2021.00016","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00016","url":null,"abstract":"Heterogeneous high performance embedded systems are increasingly used in industry. Nowadays, these platforms embed accelerator-style components, such as GPUs, alongside different CPU cores. We use multiple alternatives/versions/implementations of tasks to fully benefit from the heterogeneous capacities of such platforms and due to binary incompatibility. Implementations targeting accelerators not only require access to the accelerator but also to a CPU core for, e.g., pre-processing and branching the control flow. Hence, accelerator workloads can naturally be divided into multiple phases (e.g. CPU, GPU, CPU). We propose an asynchronous scheduling approach that utilises multiple phases and thereby enables a finegrained scheduling of tasks that require two types of hardware. We show that our approach can increase the schedulability rate by up 24% over two multi-version phase-unaware schedulers. Additionally, we demonstrate that the schedulability rate of our heuristic is close to the optimal schedulability rate.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122806795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Efficient Resource Shared RISC-V Multicore Processor 高效资源共享RISC-V多核处理器
Md. Ashraful Islam, Kenji Kise
Edge computing pushes the computational loads from the cloud to embedded devices, where data would be processed near the data source. Heterogeneous multicore architecture is believed to be a promising solution to fulfill the edge computational requirement. In FPGAs, the heterogeneous multicore is realized as multiple soft processor cores with custom processing elements. Since FPGA is a resource-constrained device, sharing the hardware resources among the soft processor cores can be advantageous. Some research has focused on the sharing resources among soft processors, but they do not study how much FPGA logic is minimized for a five-stage pipeline processor. This paper proposes the microarchitecture of a five-stage pipeline scalar processor that enables the sharing of functional units for execution among the multiple cores. We then investigate the performance and hardware resource utilization for a four-core processor. We find that sharing different functional units can save the LUT usage to 23.5% and DSP usage to 75%. We analyze the performance impact of sharing from the Embench benchmark program by simulating the same program in all four cores. Our simulation results indicate that based on the sharing configuration, the average performance drops from 2.9% to 22.3%.
边缘计算将计算负载从云推到嵌入式设备,在那里数据将在数据源附近处理。异构多核架构被认为是满足边缘计算需求的一种很有前途的解决方案。在fpga中,异构多核是由多个具有自定义处理元件的软处理器内核实现的。由于FPGA是一种资源受限的设备,因此在软处理器内核之间共享硬件资源可能是有利的。一些研究集中在软处理器之间的资源共享上,但他们没有研究对于一个五级流水线处理器来说FPGA逻辑最小化了多少。本文提出了一种五级流水线标量处理器的微体系结构,使其能够在多核之间共享执行的功能单元。然后我们研究了四核处理器的性能和硬件资源利用率。我们发现,共享不同的功能单元可以将LUT使用率降低到23.5%,DSP使用率降低到75%。我们通过在所有四个核心中模拟相同的程序来分析Embench基准程序共享对性能的影响。我们的仿真结果表明,基于共享配置,平均性能从2.9%下降到22.3%。
{"title":"Efficient Resource Shared RISC-V Multicore Processor","authors":"Md. Ashraful Islam, Kenji Kise","doi":"10.1109/MCSoC51149.2021.00061","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00061","url":null,"abstract":"Edge computing pushes the computational loads from the cloud to embedded devices, where data would be processed near the data source. Heterogeneous multicore architecture is believed to be a promising solution to fulfill the edge computational requirement. In FPGAs, the heterogeneous multicore is realized as multiple soft processor cores with custom processing elements. Since FPGA is a resource-constrained device, sharing the hardware resources among the soft processor cores can be advantageous. Some research has focused on the sharing resources among soft processors, but they do not study how much FPGA logic is minimized for a five-stage pipeline processor. This paper proposes the microarchitecture of a five-stage pipeline scalar processor that enables the sharing of functional units for execution among the multiple cores. We then investigate the performance and hardware resource utilization for a four-core processor. We find that sharing different functional units can save the LUT usage to 23.5% and DSP usage to 75%. We analyze the performance impact of sharing from the Embench benchmark program by simulating the same program in all four cores. Our simulation results indicate that based on the sharing configuration, the average performance drops from 2.9% to 22.3%.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122129971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs 多核cpu上批处理基本线性代数子程序的任务调度策略
Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura
Batched Basic Linear Algebra Subprograms (BLAS) provides an interface that allows multiple problems for a given BLAS routine (operation) - with different parameters and sizes independent of each other - to be computed in a single routine. The efficient use of cores on many-core processors has been introduced for computing multiple minor problems for which sufficient parallelism cannot be extracted from a single problem. The major goal of this study is to automatically generate high-performance batched routines for all BLAS routines using nonbatched BLAS implementation and OpenMP on CPUs. Furthermore, the primary challenge is the task scheduling method for allocating batches to cores. In this study, we propose a scheduling method based on a greedy algorithm, which allocates batches based on their costs in advance to eliminate load imbalance when the costs of batches vary. Then, we investigate the performance of five scheduling methods, including ones implemented in OpenMP and our proposed method, on matrix multiplication (GEMM) and matrix-vector multiplication (GEMV) under several conditions and environments. As a result, we found that the optimal scheduling strategy differs depending on the problem setting and environment. Based on this result, we propose an automatic generation scheme of batched BLAS from nonbatched BLAS that can introduce arbitrary task scheduling. This scheme facilitates the development of batched routines for a full set of BLAS routines and special BLAS implementations such as high-precision versions.
批处理基本线性代数子程序(BLAS)提供了一个接口,允许在单个例程中计算给定BLAS例程(操作)的多个问题(具有彼此独立的不同参数和大小)。对于无法从单个问题中提取足够的并行性的多个次要问题,引入了在多核处理器上有效使用核心的方法。本研究的主要目标是在cpu上使用非批处理BLAS实现和OpenMP自动生成所有BLAS例程的高性能批处理例程。此外,主要的挑战是将批分配给核心的任务调度方法。在本研究中,我们提出了一种基于贪心算法的调度方法,该方法根据批次的成本提前分配批次,以消除批次成本变化时的负载不平衡。然后,我们研究了五种调度方法,包括在OpenMP中实现的方法和我们提出的方法,在不同的条件和环境下对矩阵乘法(GEMM)和矩阵向量乘法(GEMV)的性能。结果表明,最优调度策略随问题设置和环境的不同而不同。在此基础上,提出了一种引入任意任务调度的从非批处理BLAS自动生成批处理BLAS的方案。该方案便于为整套BLAS例程和特殊BLAS实现(如高精度版本)开发批处理例程。
{"title":"Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs","authors":"Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura","doi":"10.1109/MCSoC51149.2021.00042","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00042","url":null,"abstract":"Batched Basic Linear Algebra Subprograms (BLAS) provides an interface that allows multiple problems for a given BLAS routine (operation) - with different parameters and sizes independent of each other - to be computed in a single routine. The efficient use of cores on many-core processors has been introduced for computing multiple minor problems for which sufficient parallelism cannot be extracted from a single problem. The major goal of this study is to automatically generate high-performance batched routines for all BLAS routines using nonbatched BLAS implementation and OpenMP on CPUs. Furthermore, the primary challenge is the task scheduling method for allocating batches to cores. In this study, we propose a scheduling method based on a greedy algorithm, which allocates batches based on their costs in advance to eliminate load imbalance when the costs of batches vary. Then, we investigate the performance of five scheduling methods, including ones implemented in OpenMP and our proposed method, on matrix multiplication (GEMM) and matrix-vector multiplication (GEMV) under several conditions and environments. As a result, we found that the optimal scheduling strategy differs depending on the problem setting and environment. Based on this result, we propose an automatic generation scheme of batched BLAS from nonbatched BLAS that can introduce arbitrary task scheduling. This scheme facilitates the development of batched routines for a full set of BLAS routines and special BLAS implementations such as high-precision versions.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121392524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Detection of Cache Side Channel Attacks Using Thread Level Monitoring of Hardware Performance Counters 利用硬件性能计数器的线程级监控检测缓存侧通道攻击
Pavitra Prakash Bhade, Sharad Sinha
Modern multiprocessor systems adopt optimization techniques to boost the speed of execution. These optimizations create vulnerabilities that can be exploited by attackers, thus causing security breaches. The hierarchical structure of cache memory where the Last Level Cache is a super set of previous levels and is shared between multiple cores of the processors creates an attack vector for cache side-channel attacks (SCA). In such attacks, the attacker is able to trace the pattern of victim process execution and correspondingly retrieve secret information by monitoring the shared cache. Mitigation techniques against such attacks trade off security against overall system performance. Hence, mitigation only when an attack is detected is needed. We propose an architecture-agnostic approach that uses hardware performance counters at run time and at thread level instead of current state of the art which use counters at system level to detect cache SCA. The proposed approach reduces the false positives by 48% when compared with system level approaches. Thus, the trade off with performance is also reduced and hence, the proposed approach is especially significant for embedded systems where processor cycle time is a limited resource.
现代多处理器系统采用优化技术来提高执行速度。这些优化产生了可以被攻击者利用的漏洞,从而导致安全漏洞。高速缓存的层次结构(其中最后一级高速缓存是前一级的超级集,并在处理器的多个核心之间共享)为高速缓存侧通道攻击(SCA)创建了攻击向量。在这种攻击中,攻击者能够跟踪受害进程的执行模式,并通过监视共享缓存相应地检索机密信息。针对此类攻击的缓解技术权衡了安全性和整体系统性能。因此,只有在检测到攻击时才需要进行缓解。我们提出了一种与架构无关的方法,该方法在运行时和线程级别使用硬件性能计数器,而不是当前使用系统级别计数器来检测缓存SCA的技术。与系统级方法相比,该方法的误报率降低了48%。因此,性能的权衡也减少了,因此,所提出的方法对于处理器周期时间有限的嵌入式系统特别重要。
{"title":"Detection of Cache Side Channel Attacks Using Thread Level Monitoring of Hardware Performance Counters","authors":"Pavitra Prakash Bhade, Sharad Sinha","doi":"10.1109/MCSoC51149.2021.00039","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00039","url":null,"abstract":"Modern multiprocessor systems adopt optimization techniques to boost the speed of execution. These optimizations create vulnerabilities that can be exploited by attackers, thus causing security breaches. The hierarchical structure of cache memory where the Last Level Cache is a super set of previous levels and is shared between multiple cores of the processors creates an attack vector for cache side-channel attacks (SCA). In such attacks, the attacker is able to trace the pattern of victim process execution and correspondingly retrieve secret information by monitoring the shared cache. Mitigation techniques against such attacks trade off security against overall system performance. Hence, mitigation only when an attack is detected is needed. We propose an architecture-agnostic approach that uses hardware performance counters at run time and at thread level instead of current state of the art which use counters at system level to detect cache SCA. The proposed approach reduces the false positives by 48% when compared with system level approaches. Thus, the trade off with performance is also reduced and hence, the proposed approach is especially significant for embedded systems where processor cycle time is a limited resource.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133325450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Framework and Its User Interface to Learn Machine Learning Models 学习机器学习模型的框架及其用户界面
Atsushi Takamiya, Md. Mostafizer Rahman, Y. Watanobe
In order to develop a system related to machine learning (ML), it is necessary to understand various contents such as prerequisite knowledge, implementation procedures, verification methods, and improvement methods. However, although general learning sites on the Web provide extensive learning contents such as videos and textbooks, they are insufficient for acquiring practical skills. In this paper, we propose a framework for learning ML and its user interface. The framework manages the ML learning phases, which includes learning the theory and practical knowledge, implementation, validation, improvement, and completion. In the model validation phase, checks are automatically applied according to the target ML model. Similarly, in the model improvement phase, improvement methods are automatically applied according to the target ML model. As a case study, we have developed contents on linear regression, classification, clustering, and dimensionality reduction.
为了开发与机器学习(ML)相关的系统,需要了解先决知识、实现流程、验证方法和改进方法等各种内容。然而,尽管网络上的一般学习网站提供了大量的学习内容,如视频和教科书,但它们不足以获得实用技能。在本文中,我们提出了一个学习机器学习及其用户界面的框架。该框架管理机器学习阶段,包括学习理论和实践知识、实现、验证、改进和完成。在模型验证阶段,根据目标ML模型自动应用检查。类似地,在模型改进阶段,根据目标ML模型自动应用改进方法。作为案例研究,我们开发了关于线性回归、分类、聚类和降维的内容。
{"title":"A Framework and Its User Interface to Learn Machine Learning Models","authors":"Atsushi Takamiya, Md. Mostafizer Rahman, Y. Watanobe","doi":"10.1109/MCSoC51149.2021.00059","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00059","url":null,"abstract":"In order to develop a system related to machine learning (ML), it is necessary to understand various contents such as prerequisite knowledge, implementation procedures, verification methods, and improvement methods. However, although general learning sites on the Web provide extensive learning contents such as videos and textbooks, they are insufficient for acquiring practical skills. In this paper, we propose a framework for learning ML and its user interface. The framework manages the ML learning phases, which includes learning the theory and practical knowledge, implementation, validation, improvement, and completion. In the model validation phase, checks are automatically applied according to the target ML model. Similarly, in the model improvement phase, improvement methods are automatically applied according to the target ML model. As a case study, we have developed contents on linear regression, classification, clustering, and dimensionality reduction.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124658795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trends and Challenges in Ensuring Security for Low-Power and High-Performance Embedded SoCs* 确保低功耗和高性能嵌入式soc安全性的趋势和挑战*
Parisa Rahimi, Ashutosh Kumar Singh, Xiaohang Wang, Alok Prakash
In recent years, security, power consumption, and performance have become the important issues in embedded SoCs’ design. With the growing number of embedded devices for automotive electronic and electric vehicles, real-time systems, robotics, artificial intelligence, smart technologies, or telecommunication, it is highly likely that these systems will be exposed to attacks or threats. Therefore, it is not easy to implement the security measure of such devices, and it becomes challenging while considering the performance and power issues due to limited available computing resources and often operating on batteries. In this paper, we survey the weaknesses of the embedded SoCs and examine the attacks, power consumption, and performance more closely with the main focus on Physical and Side-Channel attacks, which have not been surveyed previously. Along with the current trends and challenges, upcoming trends and challenges are also elaborated. This paper is intended to help the researchers and system designers in gaining deep insight into designing secure, power-efficient, and high-performance embedded SoCs in the future.
近年来,安全性、功耗和性能已成为嵌入式soc设计中的重要问题。随着用于汽车电子和电动汽车、实时系统、机器人、人工智能、智能技术或电信的嵌入式设备数量的增加,这些系统极有可能受到攻击或威胁。因此,实现此类设备的安全措施并不容易,而且由于可用的计算资源有限,并且经常使用电池,因此考虑到性能和功耗问题,这变得具有挑战性。在本文中,我们调查了嵌入式soc的弱点,并更密切地研究了攻击,功耗和性能,主要关注物理和侧信道攻击,这是以前没有调查过的。除了当前的趋势和挑战之外,还阐述了未来的趋势和挑战。本文旨在帮助研究人员和系统设计人员深入了解未来设计安全,节能和高性能的嵌入式soc。
{"title":"Trends and Challenges in Ensuring Security for Low-Power and High-Performance Embedded SoCs*","authors":"Parisa Rahimi, Ashutosh Kumar Singh, Xiaohang Wang, Alok Prakash","doi":"10.1109/MCSoC51149.2021.00041","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00041","url":null,"abstract":"In recent years, security, power consumption, and performance have become the important issues in embedded SoCs’ design. With the growing number of embedded devices for automotive electronic and electric vehicles, real-time systems, robotics, artificial intelligence, smart technologies, or telecommunication, it is highly likely that these systems will be exposed to attacks or threats. Therefore, it is not easy to implement the security measure of such devices, and it becomes challenging while considering the performance and power issues due to limited available computing resources and often operating on batteries. In this paper, we survey the weaknesses of the embedded SoCs and examine the attacks, power consumption, and performance more closely with the main focus on Physical and Side-Channel attacks, which have not been surveyed previously. Along with the current trends and challenges, upcoming trends and challenges are also elaborated. This paper is intended to help the researchers and system designers in gaining deep insight into designing secure, power-efficient, and high-performance embedded SoCs in the future.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"907 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123267368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Light-weight Enhanced Semantics-Guided Neural Networks for Skeleton-Based Human Action Recognition 基于骨骼的人体动作识别的轻量级增强语义引导神经网络
Hongbo Chen, Lei Jing
In the skeleton-based human action recognition domain, the methods based on graph convolutional networks have had great success recently. However, most graph neural networks rely on large parameters, which is not easy to train and take up a large computational cost. In the above, a simple yet effective semantics-guided neural network (SGN) obtains with a few parameters and has achieved good results. However, the simple use of semantics is limited to the improvement of recognition rate. Moreover, using only one fixed temporal convolution kernel, which is not enough to extract the temporal details comprehensively. To this end, we propose an enhanced semantics-guided neural network (ESGN) in this paper. Some simple but effective strategies are applied to ESGN, such as semantic expansion, graph pooling methods, and regularization loss function, which do not significantly increase the parameter size but improve the accuracy on two large datasets than SGN. The proposed method with an order of magnitude smaller size than most previous papers is evaluated on the NTU60 and NTU120, the experimental results show that our method achieves the state-of-the-art performance.
在基于骨骼的人体动作识别领域,基于图卷积网络的方法近年来取得了很大的成功。然而,大多数图神经网络依赖于大参数,不容易训练,计算量大。其中,一种简单而有效的语义引导神经网络(SGN)在参数较少的情况下获得了较好的结果。然而,语义的简单使用仅限于提高识别率。此外,仅使用一个固定的时间卷积核,不足以全面地提取时间细节。为此,本文提出了一种增强的语义引导神经网络(ESGN)。在ESGN中应用了一些简单而有效的策略,如语义展开、图池化方法和正则化损失函数,这些策略没有显著增加参数大小,但在两个大数据集上的准确率比SGN高。在NTU60和NTU120上对该方法进行了测试,实验结果表明该方法达到了最先进的性能。
{"title":"Light-weight Enhanced Semantics-Guided Neural Networks for Skeleton-Based Human Action Recognition","authors":"Hongbo Chen, Lei Jing","doi":"10.1109/MCSoC51149.2021.00036","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00036","url":null,"abstract":"In the skeleton-based human action recognition domain, the methods based on graph convolutional networks have had great success recently. However, most graph neural networks rely on large parameters, which is not easy to train and take up a large computational cost. In the above, a simple yet effective semantics-guided neural network (SGN) obtains with a few parameters and has achieved good results. However, the simple use of semantics is limited to the improvement of recognition rate. Moreover, using only one fixed temporal convolution kernel, which is not enough to extract the temporal details comprehensively. To this end, we propose an enhanced semantics-guided neural network (ESGN) in this paper. Some simple but effective strategies are applied to ESGN, such as semantic expansion, graph pooling methods, and regularization loss function, which do not significantly increase the parameter size but improve the accuracy on two large datasets than SGN. The proposed method with an order of magnitude smaller size than most previous papers is evaluated on the NTU60 and NTU120, the experimental results show that our method achieves the state-of-the-art performance.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124975091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-objective Reinforcement Learning for Energy Harvesting Wireless Sensor Nodes 能量采集无线传感器节点的多目标强化学习
Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura
Modern Energy Harvesting Wireless Sensor Nodes (EHWSNs) need to intelligently allocate their limited and unreliable energy budget among multiple tasks to ensure long-term uninterrupted operation. Traditional solutions are ill-equipped to deal with multiple objectives and execute a posteriori tradeoffs. We propose a general Multi-objective Reinforcement Learning (MORL) framework for Energy Neutral Operation (ENO) of EHWSNs. Our proposed framework consists of a novel Multi-objective Markov Decision Process (MOMDP) formulation and two novel MORL algorithms. Using our framework, EHWSNs can learn policies to maximize multiple task-objectives and perform dynamic runtime tradeoffs. The high computation and learning costs, usually associated with powerful MORL algorithms, can be avoided by using our comparatively less resource-intensive MORL algorithms. We evaluate our framework on a general single-task and dual-task EHWSN system model through simulations and show that our MORL algorithms can successfully tradeoff between multiple objectives at runtime.
现代能量收集无线传感器节点(EHWSNs)需要在多个任务之间智能分配有限且不可靠的能量预算,以确保长期不间断运行。传统的解决方案在处理多个目标和执行事后权衡方面装备不足。我们提出了一个通用的多目标强化学习(MORL)框架,用于EHWSNs的能量中性操作(ENO)。我们提出的框架由一个新的多目标马尔可夫决策过程(MOMDP)公式和两个新的MORL算法组成。使用我们的框架,EHWSNs可以学习策略以最大化多个任务目标并执行动态运行时权衡。使用我们相对较少资源占用的MORL算法,可以避免通常与强大的MORL算法相关的高计算和学习成本。通过仿真,我们在单任务和双任务EHWSN系统模型上评估了我们的框架,并表明我们的MORL算法可以在运行时成功地在多个目标之间进行权衡。
{"title":"Multi-objective Reinforcement Learning for Energy Harvesting Wireless Sensor Nodes","authors":"Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura","doi":"10.1109/MCSoC51149.2021.00022","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00022","url":null,"abstract":"Modern Energy Harvesting Wireless Sensor Nodes (EHWSNs) need to intelligently allocate their limited and unreliable energy budget among multiple tasks to ensure long-term uninterrupted operation. Traditional solutions are ill-equipped to deal with multiple objectives and execute a posteriori tradeoffs. We propose a general Multi-objective Reinforcement Learning (MORL) framework for Energy Neutral Operation (ENO) of EHWSNs. Our proposed framework consists of a novel Multi-objective Markov Decision Process (MOMDP) formulation and two novel MORL algorithms. Using our framework, EHWSNs can learn policies to maximize multiple task-objectives and perform dynamic runtime tradeoffs. The high computation and learning costs, usually associated with powerful MORL algorithms, can be avoided by using our comparatively less resource-intensive MORL algorithms. We evaluate our framework on a general single-task and dual-task EHWSN system model through simulations and show that our MORL algorithms can successfully tradeoff between multiple objectives at runtime.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122410280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1