Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00010
Fumiya Kono, N. Nakasato, N. Hirata, K. Matsumoto
Researches with explorations by space probes for asteroids have been performed actively to approach to the origin of the solar system and life. One of methods toward the goal is analyzing structure of solar system bodies by numerical simulation. GFandSlope is a code which calculates the gravitation field, slope, and attraction of given model data for small solar system bodies. When we use the existing sequential computation code, it is inevitable to take large time to analyze high resolution models with different initial conditions. This work achieved to compute several thousands faster than the previous by GPU implementation, which will also boost researches in the field of space science. This paper presents the evaluation of our GPU codes for fast gravitation field analysis and discusses numerical precision in floating point operations on the GPU for practical application.
{"title":"Acceleration of Gravitation Field Analysis for Asteroids by GPU Computation","authors":"Fumiya Kono, N. Nakasato, N. Hirata, K. Matsumoto","doi":"10.1109/MCSoC51149.2021.00010","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00010","url":null,"abstract":"Researches with explorations by space probes for asteroids have been performed actively to approach to the origin of the solar system and life. One of methods toward the goal is analyzing structure of solar system bodies by numerical simulation. GFandSlope is a code which calculates the gravitation field, slope, and attraction of given model data for small solar system bodies. When we use the existing sequential computation code, it is inevitable to take large time to analyze high resolution models with different initial conditions. This work achieved to compute several thousands faster than the previous by GPU implementation, which will also boost researches in the field of space science. This paper presents the evaluation of our GPU codes for fast gravitation field analysis and discusses numerical precision in floating point operations on the GPU for practical application.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123363042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00020
Fumio Hamanaka, Takuto Kanamori, Kenji Kise
To realize autonomous driving, a deep neural network (DNN) is one of the key technologies. However, since DNN needs a lot of computation, it is challenging for an edge device to support DNN with limited computation resources. A binarized neural network (BNN) has been proposed to reduce latency and parameter size and is suited for hardware implementation. Since current DNN technology is a growing and better algorithm change with time, implementing DNN on an FPGA is preferable to an ASIC. In this paper, we propose a low cost and portable mini motor car system with a BNN accelerator on an FPGA. We compare the road tracking demonstration with a similar motor car using Raspberry Pi and show the effectiveness of FPGA in a DNN implementation. The proposed system is implemented on Nexys A7, one of the most popular FPGA development boards using an Artix-7 FPGA.
{"title":"A Low Cost and Portable Mini Motor Car System with a BNN Accelerator on FPGA","authors":"Fumio Hamanaka, Takuto Kanamori, Kenji Kise","doi":"10.1109/MCSoC51149.2021.00020","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00020","url":null,"abstract":"To realize autonomous driving, a deep neural network (DNN) is one of the key technologies. However, since DNN needs a lot of computation, it is challenging for an edge device to support DNN with limited computation resources. A binarized neural network (BNN) has been proposed to reduce latency and parameter size and is suited for hardware implementation. Since current DNN technology is a growing and better algorithm change with time, implementing DNN on an FPGA is preferable to an ASIC. In this paper, we propose a low cost and portable mini motor car system with a BNN accelerator on an FPGA. We compare the road tracking demonstration with a similar motor car using Raspberry Pi and show the effectiveness of FPGA in a DNN implementation. The proposed system is implemented on Nexys A7, one of the most popular FPGA development boards using an Artix-7 FPGA.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114512264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00030
Maoyang Xiang, T. Teo
Binary neural networks (BNNs) are particularly well-suited for low-power embedded devices with limited computational capabilities. Due to the binary weight parameters, it significantly reduces memory footprint and arithmetic logic unit operations. Nevertheless, one of the disadvantages of BNN is low accuracy and sharp optimization space. Several studies of BNNs have recently shown improved accuracy in various tests via more operations and more complicated topologies. This approach, however, is incompatible with the embedded BNN application since it requires complicated data type translation. Hence, We propose a novel approach for the BNN application on the embedded system with multi-scale neural network topology in this research from two optimization perspectives: hardware structure and BNN topology, which preserves more low-level information during the feed-forward process with few operations. Our network topology achieves 91.3% accuracy for the CIFAR-10 dataset, one of the highest recorded by BNN and can process 537 tiny pictures per second when deployed on an All programmable System on Chip (APSoc) device with 4.4W power consumption.
{"title":"A Multi-scale Binarized Neural Network Application Based on All Programmable System on Chip","authors":"Maoyang Xiang, T. Teo","doi":"10.1109/MCSoC51149.2021.00030","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00030","url":null,"abstract":"Binary neural networks (BNNs) are particularly well-suited for low-power embedded devices with limited computational capabilities. Due to the binary weight parameters, it significantly reduces memory footprint and arithmetic logic unit operations. Nevertheless, one of the disadvantages of BNN is low accuracy and sharp optimization space. Several studies of BNNs have recently shown improved accuracy in various tests via more operations and more complicated topologies. This approach, however, is incompatible with the embedded BNN application since it requires complicated data type translation. Hence, We propose a novel approach for the BNN application on the embedded system with multi-scale neural network topology in this research from two optimization perspectives: hardware structure and BNN topology, which preserves more low-level information during the feed-forward process with few operations. Our network topology achieves 91.3% accuracy for the CIFAR-10 dataset, one of the highest recorded by BNN and can process 537 tiny pictures per second when deployed on an All programmable System on Chip (APSoc) device with 4.4W power consumption.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"451 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116180381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00029
Md. Al Mehedi Hasan, Fuad Al Abir, Jungpil Shin
Real-time surface recognition has become a crucial component in assuring the safe walking of intelligent autonomous robots in a complex human-living interior environment. Numerous studies have been done addressing the problem recently. Still, there is a scope of improvements for accurate classification and inference time. In this paper, we have extracted features from accelerometer and gyroscope data in the temporal, statistical and spectral domain and classified them using a tree-based ensembling classification algorithm. We have achieved 80.81% mean accuracy, classifying 9 different surfaces with 1.0% standard deviation in 10-fold cross-validation and 97.25% average AUC score. Our method acquired state-of-the-art accuracy ensuring minimal inference time which is essential for real-time recognition for the autonomous robots.
{"title":"Surface Type Classification for Autonomous Robots Using Temporal, Statistical and Spectral Feature Extraction and Selection","authors":"Md. Al Mehedi Hasan, Fuad Al Abir, Jungpil Shin","doi":"10.1109/MCSoC51149.2021.00029","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00029","url":null,"abstract":"Real-time surface recognition has become a crucial component in assuring the safe walking of intelligent autonomous robots in a complex human-living interior environment. Numerous studies have been done addressing the problem recently. Still, there is a scope of improvements for accurate classification and inference time. In this paper, we have extracted features from accelerometer and gyroscope data in the temporal, statistical and spectral domain and classified them using a tree-based ensembling classification algorithm. We have achieved 80.81% mean accuracy, classifying 9 different surfaces with 1.0% standard deviation in 10-fold cross-validation and 97.25% average AUC score. Our method acquired state-of-the-art accuracy ensuring minimal inference time which is essential for real-time recognition for the autonomous robots.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126181872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00056
Mohamed Hamada, Jesse Jeremiah Tanimu, Mohammed Hassan, H. Kakudi, Patience Robert
Cervical cancer is one of the leading causes of premature mortality among women worldwide and more than 85% of these deaths are in developing countries. There are several risk factors associated with cervical cancer. In this research, the aim is to develop a predictive model for predicting the outcome of patient's cervical cancer results, given risk patterns from individual medical records and preliminary screening. This work presents a machine learning method using Decision Tree (DT) algorithm to analyze the risk factors of cervical cancer. Recursive Feature Elimination (RFE) and least absolute shrinkage and selection operator (LASSO) feature selection techniques were fully explored to determine the most important attributes for cervical cancer prediction. Comparative analysis of the 2 feature selection techniques were performed to show the importance of feature selection in cervical cancer prediction. Based on the result of the analysis, we can conclude that the proposed model produced the highest accuracy of 98% and 96% respectively while using DT with RFE and LASSO feature selection techniques respectively.
{"title":"Evaluation of Recursive Feature Elimination and LASSO Regularization-based optimized feature selection approaches for cervical cancer prediction","authors":"Mohamed Hamada, Jesse Jeremiah Tanimu, Mohammed Hassan, H. Kakudi, Patience Robert","doi":"10.1109/MCSoC51149.2021.00056","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00056","url":null,"abstract":"Cervical cancer is one of the leading causes of premature mortality among women worldwide and more than 85% of these deaths are in developing countries. There are several risk factors associated with cervical cancer. In this research, the aim is to develop a predictive model for predicting the outcome of patient's cervical cancer results, given risk patterns from individual medical records and preliminary screening. This work presents a machine learning method using Decision Tree (DT) algorithm to analyze the risk factors of cervical cancer. Recursive Feature Elimination (RFE) and least absolute shrinkage and selection operator (LASSO) feature selection techniques were fully explored to determine the most important attributes for cervical cancer prediction. Comparative analysis of the 2 feature selection techniques were performed to show the importance of feature selection in cervical cancer prediction. Based on the result of the analysis, we can conclude that the proposed model produced the highest accuracy of 98% and 96% respectively while using DT with RFE and LASSO feature selection techniques respectively.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121541358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00047
Aika Kamei, Takuya Kojima, H. Amano, Daiki Yokoyama, Hisato Miyauchi, K. Usami, Keizo Hiraga, Kenta Suzuki, K. Bessho
In this study, a second-generation coarse-grained reconfigurable array with non-volatile flip-flops (NVFFs), known as the non-volatile cool mega array with multi-context (NVCMA/MC), is proposed. Similar to the previous NVCMA, verify-and-retriable NVFFs (VR-NVFFs) are provided for their configuration memory, constant memory, data memory, and instruction memory. The dedicated instructions for controlling the store, verify, and restore operations of the NVFFs are provided to the microcontroller in addition to power gating functions. Based on experience of the NVCMA, four hardware contexts are introduced to maintain the configuration data for four tasks, without the sacrifice of memory leakage. The array size is expanded, and pipeline registers are introduced to reduce the trade-off between the performance and power consumption. This study mainly focuses on the energy-saving effect of the VR-NVFFs and the multi-context facility of the NVCMA/MC, including the measurement of the break-even point. The evaluation of a real chip implemented with 40 nm MTJ/MOS hybrid process technology demonstrates that the store energy is reduced by 65% with the two-step store control of the VR-NVFFs. Moreover, applications that run intermittently for intervals as short as approximately 3 μs can benefit from the multi-context power gating.
{"title":"Energy saving in a multi-context coarse grained reconfigurable array with non-volatile flip-flops","authors":"Aika Kamei, Takuya Kojima, H. Amano, Daiki Yokoyama, Hisato Miyauchi, K. Usami, Keizo Hiraga, Kenta Suzuki, K. Bessho","doi":"10.1109/MCSoC51149.2021.00047","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00047","url":null,"abstract":"In this study, a second-generation coarse-grained reconfigurable array with non-volatile flip-flops (NVFFs), known as the non-volatile cool mega array with multi-context (NVCMA/MC), is proposed. Similar to the previous NVCMA, verify-and-retriable NVFFs (VR-NVFFs) are provided for their configuration memory, constant memory, data memory, and instruction memory. The dedicated instructions for controlling the store, verify, and restore operations of the NVFFs are provided to the microcontroller in addition to power gating functions. Based on experience of the NVCMA, four hardware contexts are introduced to maintain the configuration data for four tasks, without the sacrifice of memory leakage. The array size is expanded, and pipeline registers are introduced to reduce the trade-off between the performance and power consumption. This study mainly focuses on the energy-saving effect of the VR-NVFFs and the multi-context facility of the NVCMA/MC, including the measurement of the break-even point. The evaluation of a real chip implemented with 40 nm MTJ/MOS hybrid process technology demonstrates that the store energy is reduced by 65% with the two-step store control of the VR-NVFFs. Moreover, applications that run intermittently for intervals as short as approximately 3 μs can benefit from the multi-context power gating.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125459062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00034
Kaisei Shimura, Yoichi Tomioka, Qiang Zhao
A mobility scooter has come to be used to expand the range of mobility for the elderly. On the other hand, accidents involving mobility scooters have become serious problems. For example, if a mobility scooter stops inside a railway crossing due to battery exhaustion, it is very dangerous because accidental contact with a train may happen. Measuring the distance to a railway crossing during driving is helpful to avoid entrance to a railway crossing without enough battery. In this paper, we propose a method for predicting the distance to a railroad crossing based on the railway crossing warning signs in the video from a camera installed in front of the mobility scooter. In experiments, we evaluate the proposed method using images taken at various positions in relation to the railway crossing and show that the proposed method achieves higher accuracy than the distance estimation using a depth sensor.
{"title":"A Distance Estimation Method to Railway Crossing Using Warning Signs","authors":"Kaisei Shimura, Yoichi Tomioka, Qiang Zhao","doi":"10.1109/MCSoC51149.2021.00034","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00034","url":null,"abstract":"A mobility scooter has come to be used to expand the range of mobility for the elderly. On the other hand, accidents involving mobility scooters have become serious problems. For example, if a mobility scooter stops inside a railway crossing due to battery exhaustion, it is very dangerous because accidental contact with a train may happen. Measuring the distance to a railway crossing during driving is helpful to avoid entrance to a railway crossing without enough battery. In this paper, we propose a method for predicting the distance to a railroad crossing based on the railway crossing warning signs in the video from a camera installed in front of the mobility scooter. In experiments, we evaluate the proposed method using images taken at various positions in relation to the railway crossing and show that the proposed method achieves higher accuracy than the distance estimation using a depth sensor.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"452 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113967152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00046
Tanvir Ahmed, Johannes Maximilian Kühn, Ken Namura
FPGAs are gathering traction as a platform for the acceleration of applications requiring both high performance and specialization. However, exploiting the maximum compute potential of FPGAs remains a critical and time-consuming task, usually requiring expert knowledge. Typically, designers seek to maximize the usage of hardened arithmetic blocks (DSP, such as DSP48 in Xilinx devices), but as their number is limited, the critical path quickly increases when portions are mapped to lookup tables (LUT). To mitigate the DSP limitation and to maximize FPGA utilization, we propose combining FPGA overlay accelerators and a mapping method that efficiently exploits the FPGA's layout information and its resources. This mapping method relies on a two-step process: 1. extraction of architectural and layout information of the FPGA, 2. optimized placement of the processing elements (PEs) of the accelerator onto the FPGA resources. The placement step maps the PEs to DSPs and LUTs to reduce the critical path among PEs. We applied our method to implement a systolic array, a multiplier array, and a coarse-grained reconfigurable architecture (CGRA) on a Xilinx FPGA. The proposed method achieves more than 14 x performance and energy efficiency increase over the vendor tool mapping while equally maximizing FPGA utilization by more than 1.5 x compared to DSP limited mappings.
{"title":"A Highly Efficient Layout-Aware FPGA Overlay Accelerator Mapping Method","authors":"Tanvir Ahmed, Johannes Maximilian Kühn, Ken Namura","doi":"10.1109/MCSoC51149.2021.00046","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00046","url":null,"abstract":"FPGAs are gathering traction as a platform for the acceleration of applications requiring both high performance and specialization. However, exploiting the maximum compute potential of FPGAs remains a critical and time-consuming task, usually requiring expert knowledge. Typically, designers seek to maximize the usage of hardened arithmetic blocks (DSP, such as DSP48 in Xilinx devices), but as their number is limited, the critical path quickly increases when portions are mapped to lookup tables (LUT). To mitigate the DSP limitation and to maximize FPGA utilization, we propose combining FPGA overlay accelerators and a mapping method that efficiently exploits the FPGA's layout information and its resources. This mapping method relies on a two-step process: 1. extraction of architectural and layout information of the FPGA, 2. optimized placement of the processing elements (PEs) of the accelerator onto the FPGA resources. The placement step maps the PEs to DSPs and LUTs to reduce the critical path among PEs. We applied our method to implement a systolic array, a multiplier array, and a coarse-grained reconfigurable architecture (CGRA) on a Xilinx FPGA. The proposed method achieves more than 14 x performance and energy efficiency increase over the vendor tool mapping while equally maximizing FPGA utilization by more than 1.5 x compared to DSP limited mappings.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130648263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00014
Takuto Kanamori, Kenji Kise
The compressed instructions extension in RISC-V reduces the program size. However, it needs a complicated logic for the instruction fetch unit and has an impact on performance. In this paper, we propose an instruction fetch unit that supports the compressed instructions achieving high performance. Furthermore, we propose a RISC-V soft processor using this unit. We implement this proposed processor in Verilog HDL and verify the behavior using Verilog simulation and a Xilinx Artix-7 FPGA board. We compare the results of some benchmarks and the amount of hardware with related works. From the evaluation results, we show that the proposed processor achieves 42.5% performance improvement compared with VexRiscv, which is a high-performance and open source RV32IC processor.
{"title":"RVCoreP-32IC: An optimized RISC- V soft processor supporting the compressed instructions","authors":"Takuto Kanamori, Kenji Kise","doi":"10.1109/MCSoC51149.2021.00014","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00014","url":null,"abstract":"The compressed instructions extension in RISC-V reduces the program size. However, it needs a complicated logic for the instruction fetch unit and has an impact on performance. In this paper, we propose an instruction fetch unit that supports the compressed instructions achieving high performance. Furthermore, we propose a RISC-V soft processor using this unit. We implement this proposed processor in Verilog HDL and verify the behavior using Verilog simulation and a Xilinx Artix-7 FPGA board. We compare the results of some benchmarks and the amount of hardware with related works. From the evaluation results, we show that the proposed processor achieves 42.5% performance improvement compared with VexRiscv, which is a high-performance and open source RV32IC processor.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116323591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/MCSoC51149.2021.00015
Takaharu Suzuki, Kiyofumi Tanaka
In scheduling algorithms based on the Rate Monotonic (RM) method widely used in development of real-time systems, tasks with shorter periods have higher priorities. In contrast, ones with longer periods are likely to suffer from increased response times and jitters due to their lower priorities. We proposed the Execution Right Delegation (ERD) method for uniprocessor systems based on RM where a high-priority server for a privileged (or important) task is introduced to shorten response times of the task. In this paper, we propose an extended ERD method for multiprocessor systems. Our system model is based on partitioned systems while only a privileged task can migrate. In the evaluation, it is confirmed that response times of a privileged task are reduced compared with partitioned Fixed-Task-Priority(FTP) and global FTP scheduling.
{"title":"Execution Right Delegation Scheduling Algorithm for Multiprocessor","authors":"Takaharu Suzuki, Kiyofumi Tanaka","doi":"10.1109/MCSoC51149.2021.00015","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00015","url":null,"abstract":"In scheduling algorithms based on the Rate Monotonic (RM) method widely used in development of real-time systems, tasks with shorter periods have higher priorities. In contrast, ones with longer periods are likely to suffer from increased response times and jitters due to their lower priorities. We proposed the Execution Right Delegation (ERD) method for uniprocessor systems based on RM where a high-priority server for a privileged (or important) task is introduced to shorten response times of the task. In this paper, we propose an extended ERD method for multiprocessor systems. Our system model is based on partitioned systems while only a privileged task can migrate. In the evaluation, it is confirmed that response times of a privileged task are reduced compared with partitioned Fixed-Task-Priority(FTP) and global FTP scheduling.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131188584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}