Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00052
Yohei Shimmyo, Y. Okuyama
This paper presents a mini-batch training methodology along convolutional windows for layer-wised STDP unsupervised training on convolutional layers in order to shorten the training time of spiking neural networks (SNNs). SNN is a third-generation neural network that uses an accurate neuron model compared to rate-coded models used in conventional artificial neural networks (ANNs). The mini-batches of input convolution windows are convoluted at once. Then, the input, output, and current filter generate a batch of weight updates at once. This system reduces overheads of library calls or GPU execution. The batch processing methodology leads more significant and extensive models to be trained in ANNs, while many evaluations of direct SNN training methodologies are limited to smaller models. Currently, training large-scale models is virtually impossible. We evaluated the mini-batch processing effect on training speed and feature extraction power against various mini-batch sizes. The result showed that a larger mini-batch size enables us to utilize GPUs effectively, maintaining comparable feature extraction power. This research concludes that mini-batch training along convolution windows reduces training time by STDP training rule.
{"title":"Mini-Batch Training along Convolution Windows for Representation Learning Based on Spike-Time-Dependent-Plasticity Rule","authors":"Yohei Shimmyo, Y. Okuyama","doi":"10.1109/MCSoC51149.2021.00052","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00052","url":null,"abstract":"This paper presents a mini-batch training methodology along convolutional windows for layer-wised STDP unsupervised training on convolutional layers in order to shorten the training time of spiking neural networks (SNNs). SNN is a third-generation neural network that uses an accurate neuron model compared to rate-coded models used in conventional artificial neural networks (ANNs). The mini-batches of input convolution windows are convoluted at once. Then, the input, output, and current filter generate a batch of weight updates at once. This system reduces overheads of library calls or GPU execution. The batch processing methodology leads more significant and extensive models to be trained in ANNs, while many evaluations of direct SNN training methodologies are limited to smaller models. Currently, training large-scale models is virtually impossible. We evaluated the mini-batch processing effect on training speed and feature extraction power against various mini-batch sizes. The result showed that a larger mini-batch size enables us to utilize GPUs effectively, maintaining comparable feature extraction power. This research concludes that mini-batch training along convolution windows reduces training time by STDP training rule.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126180897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00040
Michael J. Giardino, D. Schwyn, Bonnie H. Ferri, A. Ferri
This paper describes the design and performance of Q-learning-based quality-of-service manager (2QoSM) for compute-aware applications (CAAs) as part of platform-agnostic resource management framework. CAAs and hardware are able to share metrics of performance with the 2QoSM and the 2QoSM can attempt to reconfigure CAAs and hardware to meet performance targets. This enables many co-design benefits while allowing for policy and platform portability. The use of Q-Learning allows online generation of the power management policy without requiring details about system state or actions, and can meet different goals including error, power minimization, or a combination of both. 2QoSM, evaluated using an embedded MCSoC controlling a mobile robot, reduces power compared to the Linux on-demand governor by 38.7-42.6% and a situation-aware governor by 4.0-10.2%. An error-minimization policy obtained a reduction in path-following error of 4.6-8.9%.
{"title":"2QoSM: A Q-Learner QoS Manager for Application-Guided Power-Aware Systems","authors":"Michael J. Giardino, D. Schwyn, Bonnie H. Ferri, A. Ferri","doi":"10.1109/MCSoC51149.2021.00040","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00040","url":null,"abstract":"This paper describes the design and performance of Q-learning-based quality-of-service manager (2QoSM) for compute-aware applications (CAAs) as part of platform-agnostic resource management framework. CAAs and hardware are able to share metrics of performance with the 2QoSM and the 2QoSM can attempt to reconfigure CAAs and hardware to meet performance targets. This enables many co-design benefits while allowing for policy and platform portability. The use of Q-Learning allows online generation of the power management policy without requiring details about system state or actions, and can meet different goals including error, power minimization, or a combination of both. 2QoSM, evaluated using an embedded MCSoC controlling a mobile robot, reduces power compared to the Linux on-demand governor by 38.7-42.6% and a situation-aware governor by 4.0-10.2%. An error-minimization policy obtained a reduction in path-following error of 4.6-8.9%.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123587724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00016
Julius Roeder, Benjamin Rouxel, C. Grelck
Heterogeneous high performance embedded systems are increasingly used in industry. Nowadays, these platforms embed accelerator-style components, such as GPUs, alongside different CPU cores. We use multiple alternatives/versions/implementations of tasks to fully benefit from the heterogeneous capacities of such platforms and due to binary incompatibility. Implementations targeting accelerators not only require access to the accelerator but also to a CPU core for, e.g., pre-processing and branching the control flow. Hence, accelerator workloads can naturally be divided into multiple phases (e.g. CPU, GPU, CPU). We propose an asynchronous scheduling approach that utilises multiple phases and thereby enables a finegrained scheduling of tasks that require two types of hardware. We show that our approach can increase the schedulability rate by up 24% over two multi-version phase-unaware schedulers. Additionally, we demonstrate that the schedulability rate of our heuristic is close to the optimal schedulability rate.
{"title":"Scheduling DAGs of Multi-Version Multi-Phase Tasks on Heterogeneous Real-Time Systems","authors":"Julius Roeder, Benjamin Rouxel, C. Grelck","doi":"10.1109/MCSoC51149.2021.00016","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00016","url":null,"abstract":"Heterogeneous high performance embedded systems are increasingly used in industry. Nowadays, these platforms embed accelerator-style components, such as GPUs, alongside different CPU cores. We use multiple alternatives/versions/implementations of tasks to fully benefit from the heterogeneous capacities of such platforms and due to binary incompatibility. Implementations targeting accelerators not only require access to the accelerator but also to a CPU core for, e.g., pre-processing and branching the control flow. Hence, accelerator workloads can naturally be divided into multiple phases (e.g. CPU, GPU, CPU). We propose an asynchronous scheduling approach that utilises multiple phases and thereby enables a finegrained scheduling of tasks that require two types of hardware. We show that our approach can increase the schedulability rate by up 24% over two multi-version phase-unaware schedulers. Additionally, we demonstrate that the schedulability rate of our heuristic is close to the optimal schedulability rate.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122806795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00061
Md. Ashraful Islam, Kenji Kise
Edge computing pushes the computational loads from the cloud to embedded devices, where data would be processed near the data source. Heterogeneous multicore architecture is believed to be a promising solution to fulfill the edge computational requirement. In FPGAs, the heterogeneous multicore is realized as multiple soft processor cores with custom processing elements. Since FPGA is a resource-constrained device, sharing the hardware resources among the soft processor cores can be advantageous. Some research has focused on the sharing resources among soft processors, but they do not study how much FPGA logic is minimized for a five-stage pipeline processor. This paper proposes the microarchitecture of a five-stage pipeline scalar processor that enables the sharing of functional units for execution among the multiple cores. We then investigate the performance and hardware resource utilization for a four-core processor. We find that sharing different functional units can save the LUT usage to 23.5% and DSP usage to 75%. We analyze the performance impact of sharing from the Embench benchmark program by simulating the same program in all four cores. Our simulation results indicate that based on the sharing configuration, the average performance drops from 2.9% to 22.3%.
{"title":"Efficient Resource Shared RISC-V Multicore Processor","authors":"Md. Ashraful Islam, Kenji Kise","doi":"10.1109/MCSoC51149.2021.00061","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00061","url":null,"abstract":"Edge computing pushes the computational loads from the cloud to embedded devices, where data would be processed near the data source. Heterogeneous multicore architecture is believed to be a promising solution to fulfill the edge computational requirement. In FPGAs, the heterogeneous multicore is realized as multiple soft processor cores with custom processing elements. Since FPGA is a resource-constrained device, sharing the hardware resources among the soft processor cores can be advantageous. Some research has focused on the sharing resources among soft processors, but they do not study how much FPGA logic is minimized for a five-stage pipeline processor. This paper proposes the microarchitecture of a five-stage pipeline scalar processor that enables the sharing of functional units for execution among the multiple cores. We then investigate the performance and hardware resource utilization for a four-core processor. We find that sharing different functional units can save the LUT usage to 23.5% and DSP usage to 75%. We analyze the performance impact of sharing from the Embench benchmark program by simulating the same program in all four cores. Our simulation results indicate that based on the sharing configuration, the average performance drops from 2.9% to 22.3%.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122129971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00042
Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura
Batched Basic Linear Algebra Subprograms (BLAS) provides an interface that allows multiple problems for a given BLAS routine (operation) - with different parameters and sizes independent of each other - to be computed in a single routine. The efficient use of cores on many-core processors has been introduced for computing multiple minor problems for which sufficient parallelism cannot be extracted from a single problem. The major goal of this study is to automatically generate high-performance batched routines for all BLAS routines using nonbatched BLAS implementation and OpenMP on CPUs. Furthermore, the primary challenge is the task scheduling method for allocating batches to cores. In this study, we propose a scheduling method based on a greedy algorithm, which allocates batches based on their costs in advance to eliminate load imbalance when the costs of batches vary. Then, we investigate the performance of five scheduling methods, including ones implemented in OpenMP and our proposed method, on matrix multiplication (GEMM) and matrix-vector multiplication (GEMV) under several conditions and environments. As a result, we found that the optimal scheduling strategy differs depending on the problem setting and environment. Based on this result, we propose an automatic generation scheme of batched BLAS from nonbatched BLAS that can introduce arbitrary task scheduling. This scheme facilitates the development of batched routines for a full set of BLAS routines and special BLAS implementations such as high-precision versions.
{"title":"Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs","authors":"Daichi Mukunoki, Yusuke Hirota, Toshiyuki Imamura","doi":"10.1109/MCSoC51149.2021.00042","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00042","url":null,"abstract":"Batched Basic Linear Algebra Subprograms (BLAS) provides an interface that allows multiple problems for a given BLAS routine (operation) - with different parameters and sizes independent of each other - to be computed in a single routine. The efficient use of cores on many-core processors has been introduced for computing multiple minor problems for which sufficient parallelism cannot be extracted from a single problem. The major goal of this study is to automatically generate high-performance batched routines for all BLAS routines using nonbatched BLAS implementation and OpenMP on CPUs. Furthermore, the primary challenge is the task scheduling method for allocating batches to cores. In this study, we propose a scheduling method based on a greedy algorithm, which allocates batches based on their costs in advance to eliminate load imbalance when the costs of batches vary. Then, we investigate the performance of five scheduling methods, including ones implemented in OpenMP and our proposed method, on matrix multiplication (GEMM) and matrix-vector multiplication (GEMV) under several conditions and environments. As a result, we found that the optimal scheduling strategy differs depending on the problem setting and environment. Based on this result, we propose an automatic generation scheme of batched BLAS from nonbatched BLAS that can introduce arbitrary task scheduling. This scheme facilitates the development of batched routines for a full set of BLAS routines and special BLAS implementations such as high-precision versions.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121392524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00039
Pavitra Prakash Bhade, Sharad Sinha
Modern multiprocessor systems adopt optimization techniques to boost the speed of execution. These optimizations create vulnerabilities that can be exploited by attackers, thus causing security breaches. The hierarchical structure of cache memory where the Last Level Cache is a super set of previous levels and is shared between multiple cores of the processors creates an attack vector for cache side-channel attacks (SCA). In such attacks, the attacker is able to trace the pattern of victim process execution and correspondingly retrieve secret information by monitoring the shared cache. Mitigation techniques against such attacks trade off security against overall system performance. Hence, mitigation only when an attack is detected is needed. We propose an architecture-agnostic approach that uses hardware performance counters at run time and at thread level instead of current state of the art which use counters at system level to detect cache SCA. The proposed approach reduces the false positives by 48% when compared with system level approaches. Thus, the trade off with performance is also reduced and hence, the proposed approach is especially significant for embedded systems where processor cycle time is a limited resource.
{"title":"Detection of Cache Side Channel Attacks Using Thread Level Monitoring of Hardware Performance Counters","authors":"Pavitra Prakash Bhade, Sharad Sinha","doi":"10.1109/MCSoC51149.2021.00039","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00039","url":null,"abstract":"Modern multiprocessor systems adopt optimization techniques to boost the speed of execution. These optimizations create vulnerabilities that can be exploited by attackers, thus causing security breaches. The hierarchical structure of cache memory where the Last Level Cache is a super set of previous levels and is shared between multiple cores of the processors creates an attack vector for cache side-channel attacks (SCA). In such attacks, the attacker is able to trace the pattern of victim process execution and correspondingly retrieve secret information by monitoring the shared cache. Mitigation techniques against such attacks trade off security against overall system performance. Hence, mitigation only when an attack is detected is needed. We propose an architecture-agnostic approach that uses hardware performance counters at run time and at thread level instead of current state of the art which use counters at system level to detect cache SCA. The proposed approach reduces the false positives by 48% when compared with system level approaches. Thus, the trade off with performance is also reduced and hence, the proposed approach is especially significant for embedded systems where processor cycle time is a limited resource.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133325450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00059
Atsushi Takamiya, Md. Mostafizer Rahman, Y. Watanobe
In order to develop a system related to machine learning (ML), it is necessary to understand various contents such as prerequisite knowledge, implementation procedures, verification methods, and improvement methods. However, although general learning sites on the Web provide extensive learning contents such as videos and textbooks, they are insufficient for acquiring practical skills. In this paper, we propose a framework for learning ML and its user interface. The framework manages the ML learning phases, which includes learning the theory and practical knowledge, implementation, validation, improvement, and completion. In the model validation phase, checks are automatically applied according to the target ML model. Similarly, in the model improvement phase, improvement methods are automatically applied according to the target ML model. As a case study, we have developed contents on linear regression, classification, clustering, and dimensionality reduction.
{"title":"A Framework and Its User Interface to Learn Machine Learning Models","authors":"Atsushi Takamiya, Md. Mostafizer Rahman, Y. Watanobe","doi":"10.1109/MCSoC51149.2021.00059","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00059","url":null,"abstract":"In order to develop a system related to machine learning (ML), it is necessary to understand various contents such as prerequisite knowledge, implementation procedures, verification methods, and improvement methods. However, although general learning sites on the Web provide extensive learning contents such as videos and textbooks, they are insufficient for acquiring practical skills. In this paper, we propose a framework for learning ML and its user interface. The framework manages the ML learning phases, which includes learning the theory and practical knowledge, implementation, validation, improvement, and completion. In the model validation phase, checks are automatically applied according to the target ML model. Similarly, in the model improvement phase, improvement methods are automatically applied according to the target ML model. As a case study, we have developed contents on linear regression, classification, clustering, and dimensionality reduction.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124658795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, security, power consumption, and performance have become the important issues in embedded SoCs’ design. With the growing number of embedded devices for automotive electronic and electric vehicles, real-time systems, robotics, artificial intelligence, smart technologies, or telecommunication, it is highly likely that these systems will be exposed to attacks or threats. Therefore, it is not easy to implement the security measure of such devices, and it becomes challenging while considering the performance and power issues due to limited available computing resources and often operating on batteries. In this paper, we survey the weaknesses of the embedded SoCs and examine the attacks, power consumption, and performance more closely with the main focus on Physical and Side-Channel attacks, which have not been surveyed previously. Along with the current trends and challenges, upcoming trends and challenges are also elaborated. This paper is intended to help the researchers and system designers in gaining deep insight into designing secure, power-efficient, and high-performance embedded SoCs in the future.
{"title":"Trends and Challenges in Ensuring Security for Low-Power and High-Performance Embedded SoCs*","authors":"Parisa Rahimi, Ashutosh Kumar Singh, Xiaohang Wang, Alok Prakash","doi":"10.1109/MCSoC51149.2021.00041","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00041","url":null,"abstract":"In recent years, security, power consumption, and performance have become the important issues in embedded SoCs’ design. With the growing number of embedded devices for automotive electronic and electric vehicles, real-time systems, robotics, artificial intelligence, smart technologies, or telecommunication, it is highly likely that these systems will be exposed to attacks or threats. Therefore, it is not easy to implement the security measure of such devices, and it becomes challenging while considering the performance and power issues due to limited available computing resources and often operating on batteries. In this paper, we survey the weaknesses of the embedded SoCs and examine the attacks, power consumption, and performance more closely with the main focus on Physical and Side-Channel attacks, which have not been surveyed previously. Along with the current trends and challenges, upcoming trends and challenges are also elaborated. This paper is intended to help the researchers and system designers in gaining deep insight into designing secure, power-efficient, and high-performance embedded SoCs in the future.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"907 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123267368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00036
Hongbo Chen, Lei Jing
In the skeleton-based human action recognition domain, the methods based on graph convolutional networks have had great success recently. However, most graph neural networks rely on large parameters, which is not easy to train and take up a large computational cost. In the above, a simple yet effective semantics-guided neural network (SGN) obtains with a few parameters and has achieved good results. However, the simple use of semantics is limited to the improvement of recognition rate. Moreover, using only one fixed temporal convolution kernel, which is not enough to extract the temporal details comprehensively. To this end, we propose an enhanced semantics-guided neural network (ESGN) in this paper. Some simple but effective strategies are applied to ESGN, such as semantic expansion, graph pooling methods, and regularization loss function, which do not significantly increase the parameter size but improve the accuracy on two large datasets than SGN. The proposed method with an order of magnitude smaller size than most previous papers is evaluated on the NTU60 and NTU120, the experimental results show that our method achieves the state-of-the-art performance.
{"title":"Light-weight Enhanced Semantics-Guided Neural Networks for Skeleton-Based Human Action Recognition","authors":"Hongbo Chen, Lei Jing","doi":"10.1109/MCSoC51149.2021.00036","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00036","url":null,"abstract":"In the skeleton-based human action recognition domain, the methods based on graph convolutional networks have had great success recently. However, most graph neural networks rely on large parameters, which is not easy to train and take up a large computational cost. In the above, a simple yet effective semantics-guided neural network (SGN) obtains with a few parameters and has achieved good results. However, the simple use of semantics is limited to the improvement of recognition rate. Moreover, using only one fixed temporal convolution kernel, which is not enough to extract the temporal details comprehensively. To this end, we propose an enhanced semantics-guided neural network (ESGN) in this paper. Some simple but effective strategies are applied to ESGN, such as semantic expansion, graph pooling methods, and regularization loss function, which do not significantly increase the parameter size but improve the accuracy on two large datasets than SGN. The proposed method with an order of magnitude smaller size than most previous papers is evaluated on the NTU60 and NTU120, the experimental results show that our method achieves the state-of-the-art performance.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124975091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modern Energy Harvesting Wireless Sensor Nodes (EHWSNs) need to intelligently allocate their limited and unreliable energy budget among multiple tasks to ensure long-term uninterrupted operation. Traditional solutions are ill-equipped to deal with multiple objectives and execute a posteriori tradeoffs. We propose a general Multi-objective Reinforcement Learning (MORL) framework for Energy Neutral Operation (ENO) of EHWSNs. Our proposed framework consists of a novel Multi-objective Markov Decision Process (MOMDP) formulation and two novel MORL algorithms. Using our framework, EHWSNs can learn policies to maximize multiple task-objectives and perform dynamic runtime tradeoffs. The high computation and learning costs, usually associated with powerful MORL algorithms, can be avoided by using our comparatively less resource-intensive MORL algorithms. We evaluate our framework on a general single-task and dual-task EHWSN system model through simulations and show that our MORL algorithms can successfully tradeoff between multiple objectives at runtime.
{"title":"Multi-objective Reinforcement Learning for Energy Harvesting Wireless Sensor Nodes","authors":"Shaswot Shresthamali, Masaaki Kondo, Hiroshi Nakamura","doi":"10.1109/MCSoC51149.2021.00022","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00022","url":null,"abstract":"Modern Energy Harvesting Wireless Sensor Nodes (EHWSNs) need to intelligently allocate their limited and unreliable energy budget among multiple tasks to ensure long-term uninterrupted operation. Traditional solutions are ill-equipped to deal with multiple objectives and execute a posteriori tradeoffs. We propose a general Multi-objective Reinforcement Learning (MORL) framework for Energy Neutral Operation (ENO) of EHWSNs. Our proposed framework consists of a novel Multi-objective Markov Decision Process (MOMDP) formulation and two novel MORL algorithms. Using our framework, EHWSNs can learn policies to maximize multiple task-objectives and perform dynamic runtime tradeoffs. The high computation and learning costs, usually associated with powerful MORL algorithms, can be avoided by using our comparatively less resource-intensive MORL algorithms. We evaluate our framework on a general single-task and dual-task EHWSN system model through simulations and show that our MORL algorithms can successfully tradeoff between multiple objectives at runtime.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122410280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}