Pub Date : 2023-11-01DOI: 10.1016/j.micpro.2023.104970
Mohammad Masdari , Sultan Noman Qasem , Hao-Ting Pai
Network on Chip (NoC) is an interesting technology that benefits from several processing elements and the necessary communication facilities, to provide an answer to the ever-growing need for more processing power. Metaheuristic algorithms are important tools that have been used for dealing with various NP-hard problems in different domains. Such algorithms are also widely used in the NoC context by many frameworks for optimizing various characteristics of the NoC environments. Nonetheless, there is a lack of a comprehensive survey to put forward a thorough study of such schemes. To fill this gap, this article presents a comprehensive survey and classification of the metaheuristic-based schemes designed for various NoC topologies. For this purpose, first, some background knowledge is provided which helps to understand the studied schemes. Then, a taxonomy of the investigated approaches based on their applied metaheuristic algorithms is presented and in each category, schemes are studied and their main contributions and properties as well as their limitations are discussed. At last, a comparison of the techniques, tools, and methods that have been used in the studied schemes are provided along with the concluding remarks and future research directions.
{"title":"Optimizing Network-on-Chip using metaheuristic algorithms: A comprehensive survey","authors":"Mohammad Masdari , Sultan Noman Qasem , Hao-Ting Pai","doi":"10.1016/j.micpro.2023.104970","DOIUrl":"https://doi.org/10.1016/j.micpro.2023.104970","url":null,"abstract":"<div><p>Network on Chip (NoC) is an interesting technology that benefits from several processing elements and the necessary communication facilities, to provide an answer to the ever-growing need for more processing power. Metaheuristic algorithms are important tools that have been used for dealing with various NP-hard problems in different domains. Such algorithms are also widely used in the NoC context by many frameworks for optimizing various characteristics of the NoC environments. Nonetheless, there is a lack of a comprehensive survey to put forward a thorough study of such schemes. To fill this gap, this article presents a comprehensive survey and classification of the metaheuristic-based schemes designed for various NoC topologies. For this purpose, first, some background knowledge is provided which helps to understand the studied schemes. Then, a taxonomy of the investigated approaches based on their applied metaheuristic algorithms is presented and in each category, schemes are studied and their main contributions and properties as well as their limitations are discussed. At last, a comparison of the techniques, tools, and methods that have been used in the studied schemes are provided along with the concluding remarks and future research directions.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"103 ","pages":"Article 104970"},"PeriodicalIF":2.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92046110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01DOI: 10.1016/j.micpro.2023.104967
Mehrdad Saadatmand , Muhammad Abbas , Eduard Paul Enoiu , Bernd-Holger Schlingloff , Wasif Afzal , Benedikt Dornauer , Michael Felderer
Software systems are often built in increments with additional features or enhancements on top of existing products. This incremental development may result in the deterioration of certain quality aspects. In other words, the software can be considered an evolving entity emanating different quality characteristics as it gets updated over time with new features or deployed in different operational environments. Approaching software development with this mindset and awareness regarding quality evolution over time can be a key factor for the long-term success of a company in today’s highly competitive market of industrial software-intensive products. Therefore, it is important to be able to accurately analyze and determine the quality implications of each change and increment to a software system. To address this challenge, the multinational SmartDelta project develops automated solutions for the quality assessment of product deltas in a continuous engineering environment. The project provides smart analytics from development artifacts and system executions, offering insights into quality degradation or improvements across different product versions, and providing recommendations for the next builds. This paper presents the challenges in incremental software development tackled in the scope of the SmartDelta project, and the solutions that are produced and planned in the project, along with the industrial impact of the project for software-intensive industrial systems.
{"title":"SmartDelta project: Automated quality assurance and optimization across product versions and variants","authors":"Mehrdad Saadatmand , Muhammad Abbas , Eduard Paul Enoiu , Bernd-Holger Schlingloff , Wasif Afzal , Benedikt Dornauer , Michael Felderer","doi":"10.1016/j.micpro.2023.104967","DOIUrl":"https://doi.org/10.1016/j.micpro.2023.104967","url":null,"abstract":"<div><p>Software systems are often built in increments with additional features or enhancements on top of existing products. This incremental development may result in the deterioration of certain quality aspects. In other words, the software can be considered an evolving entity emanating different quality characteristics as it gets updated over time with new features or deployed in different operational environments. Approaching software development with this mindset and awareness regarding quality evolution over time can be a key factor for the long-term success of a company in today’s highly competitive market of industrial software-intensive products. Therefore, it is important to be able to accurately analyze and determine the quality implications of each change and increment to a software system. To address this challenge, the multinational SmartDelta project develops automated solutions for the quality assessment of product deltas in a continuous engineering environment. The project provides smart analytics from development artifacts and system executions, offering insights into quality degradation or improvements across different product versions, and providing recommendations for the next builds. This paper presents the challenges in incremental software development tackled in the scope of the SmartDelta project, and the solutions that are produced and planned in the project, along with the industrial impact of the project for software-intensive industrial systems.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"103 ","pages":"Article 104967"},"PeriodicalIF":2.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933123002119/pdfft?md5=f2f4c77923b79d0a277b67398c986b39&pid=1-s2.0-S0141933123002119-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92046145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01DOI: 10.1016/j.micpro.2023.104971
Ziwei Wang , Wei Li , Ziqi Shuai , Qingan Li
Phase Change Memory (PCM) is considered a promising replacement for DRAM due to its superior performance characteristics such as low leakage power, high integration density, byte addressability and non-volatility. However, PCM’s limited write endurance significantly hinders its wide application. For example, PCM wears out quickly with traditional dynamic memory allocation policy in embedded systems which aggregates lots of writes in few memory blocks. To extend the lifespan of PCM, some wear-aware dynamic memory allocators have been proposed, which generally depend on some fixed parameters to limit the wear of PCM. However, these allocators can be inflexible as it is difficult to specify appropriate values for the required parameters in different scenarios. In this paper, we propose a Self-Adaptive Generational Wear-Aware Allocator (GWalloc). GWalloc divides memory blocks into two generations: the young and the old generation, according to their number of allocation times. GWalloc also dynamically adjusts the system’s wear threshold during allocations so that it can effectively balance the wear degree of PCM and the consumed memory space. The wear threshold restricts the upper wear limit of young memory blocks. Experimental evaluations show that compared with the state-of-the-art wear-aware dynamic memory allocators (NVMalloc, Walloc and UWLalloc), GWalloc improves PCM wear-leveling (evaluated by CV, a wear leveling indicator) by 38.6%, 39.1% and 38.3%, and saves 62.1%, 22.2% and 37.2% memory space overhead.
{"title":"GWalloc: A self-adaptive generational wear-aware allocator for non-volatile main memory","authors":"Ziwei Wang , Wei Li , Ziqi Shuai , Qingan Li","doi":"10.1016/j.micpro.2023.104971","DOIUrl":"https://doi.org/10.1016/j.micpro.2023.104971","url":null,"abstract":"<div><p>Phase Change Memory (PCM) is considered a promising replacement for DRAM due to its superior performance characteristics such as low leakage power, high integration density, byte addressability and non-volatility. However, PCM’s limited write endurance significantly hinders its wide application. For example, PCM wears out quickly with traditional dynamic memory allocation policy in embedded systems which aggregates lots of writes in few memory blocks. To extend the lifespan of PCM, some wear-aware dynamic memory allocators have been proposed, which generally depend on some fixed parameters to limit the wear of PCM. However, these allocators can be inflexible as it is difficult to specify appropriate values for the required parameters in different scenarios. In this paper, we propose a Self-Adaptive Generational Wear-Aware Allocator (<em>GWalloc</em>). <em>GWalloc</em> divides memory blocks into two generations: the <em>young</em> and the <em>old</em> generation, according to their number of allocation times. <em>GWalloc</em> also dynamically adjusts the system’s wear threshold during allocations so that it can effectively balance the wear degree of PCM and the consumed memory space. The wear threshold restricts the upper wear limit of young memory blocks. Experimental evaluations show that compared with the state-of-the-art wear-aware dynamic memory allocators (<em>NVMalloc</em>, <em>Walloc</em> and <em>UWLalloc</em>), <em>GWalloc</em> improves PCM wear-leveling (evaluated by CV, a wear leveling indicator) by 38.6%, 39.1% and 38.3%, and saves 62.1%, 22.2% and 37.2% memory space overhead.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"103 ","pages":"Article 104971"},"PeriodicalIF":2.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92046111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01DOI: 10.1016/j.micpro.2023.104961
Hongbo Xie , Yincheng Qi , Farah Qasim Ahmed Alyousuf
Communication links forming secure telecommunications networks rely on various technologies such as message switching, circuit switching, or packet switching to transmit messages and data. Hamming codes, a family of linear error-correcting codes, are commonly used in communication networks to detect and correct one-bit and two-bit errors. However, reducing power consumption, occupied area, and latency in secure telecommunication networks remains a challenge for future information and communication technology. To address these challenges, emerging technologies like quantum dots offer potential solutions. Quantum-dot cellular automata (QCA) stands as a promising frontier in nanotechnology for enhancing secure telecommunications networks. It opens up the possibility of crafting high-performance, energy-efficient digital circuits. This research harnesses the potential of QCA and introduces groundbreaking innovations: a 3-8 decoder employing a single-layer layout and a 3-input XOR gate with a multi-layer configuration. These components are utilized in the design of an electronic circuit for Hamming codes, incorporating the QCA-based approach. It is important to note that practical implementation in real-world scenarios presents challenges due to the nature of QCA technology. As a result, the evaluation and validation of the proposed designs heavily rely on simulations using QCADesigner. While experimental validation in real-world scenarios is limited, the simulations provide insights into the functionality and feasibility of the suggested designs. By leveraging QCA, the proposed Hamming code circuit significantly enhances cell count, occupied area, and clock latency. The suggested design can be adapted to fit different generating matrices in Hamming codes without requiring drastic modifications to the underlying architecture.
{"title":"Designing an ultra-efficient Hamming code generator circuit for a secure nano-telecommunication network","authors":"Hongbo Xie , Yincheng Qi , Farah Qasim Ahmed Alyousuf","doi":"10.1016/j.micpro.2023.104961","DOIUrl":"https://doi.org/10.1016/j.micpro.2023.104961","url":null,"abstract":"<div><p>Communication links forming secure telecommunications networks rely on various technologies such as message switching, circuit switching, or packet switching to transmit messages and data. Hamming codes, a family of linear error-correcting codes, are commonly used in communication networks to detect and correct one-bit and two-bit errors. However, reducing power consumption, occupied area, and latency in secure telecommunication networks remains a challenge for future information and communication technology. To address these challenges, emerging technologies like quantum dots offer potential solutions. Quantum-dot cellular automata (QCA) stands as a promising frontier in nanotechnology for enhancing secure telecommunications networks. It opens up the possibility of crafting high-performance, energy-efficient digital circuits. This research harnesses the potential of QCA and introduces groundbreaking innovations: a 3-8 decoder employing a single-layer layout and a 3-input XOR gate with a multi-layer configuration. These components are utilized in the design of an electronic circuit for Hamming codes, incorporating the QCA-based approach. It is important to note that practical implementation in real-world scenarios presents challenges due to the nature of QCA technology. As a result, the evaluation and validation of the proposed designs heavily rely on simulations using QCADesigner. While experimental validation in real-world scenarios is limited, the simulations provide insights into the functionality and feasibility of the suggested designs. By leveraging QCA, the proposed Hamming code circuit significantly enhances cell count, occupied area, and clock latency. The suggested design can be adapted to fit different generating matrices in Hamming codes without requiring drastic modifications to the underlying architecture.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"103 ","pages":"Article 104961"},"PeriodicalIF":2.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"92115792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recognition of the consequence for advanced tools and techniques to secure the network infrastructure from the security risks has prompted the advancement of many machine learning-based intrusion detection strategies. However, it is a big challenge for the researchers to make improvements in an Intrusion Detection System with desired advantages and constraints. This paper has developed a proficient soft computing framework using Grey Wolf Optimization and Entropy-Based Graph (GWO-EBG) to classify intrusion detection datasets to reduce the false rate. In the proposed scheme, initially, the input data is preprocessed by the data transformation and normalization procedure. After the preprocessing, optimal features have been chosen for the dimension reduction from the preprocessed data using the grey wolf optimization (GWO) algorithm. Then, the Entropy value has estimated from the idyllically selected features. Lastly, an Entropy-Based Graph (EBG) has been constructed to classify data into intrusion or normal data. The experimental results demonstrate that the developed method outperforms other existing methods in various performance measures. The detection rate of the developed GWO-EBG is found to be 94.6%, which is higher than 91.24 % of EBG, 75.60 % K-Nearest Neighbors (KNN), 73.36 % of Support Vector Machine (SVM), and 74.88 % of Generalized Regression Neural Network (GRNN) on 5000 connection vectors data obtained from KDD CUP’99 testing dataset. The false-positive rate of developed strategy (GWO-EBG) is 0.35 %%, which is lower than 2.18 % of EBG, 7.32 % KNN, 8.15 % of SVM, and 8.13 % of GRNN with 5000 testing datasets.
{"title":"A framework for detection of cyber attacks by the classification of intrusion detection datasets","authors":"Durgesh Srivastava , Rajeshwar Singh , Chinmay Chakraborty , Sunil Kr. Maakar , Aaisha Makkar , Deepak Sinwar","doi":"10.1016/j.micpro.2023.104964","DOIUrl":"10.1016/j.micpro.2023.104964","url":null,"abstract":"<div><p><span><span>Recognition of the consequence for advanced tools and techniques to secure the network infrastructure from the security risks has prompted the advancement of many machine learning-based intrusion detection strategies. However, it is a big challenge for the researchers to make improvements in an </span>Intrusion Detection System with desired advantages and constraints. This paper has developed a proficient soft computing framework using </span>Grey Wolf Optimization<span> and Entropy-Based Graph (GWO-EBG) to classify intrusion detection datasets to reduce the false rate. In the proposed scheme, initially, the input data is preprocessed by the data transformation and normalization procedure. After the preprocessing, optimal features have been chosen for the dimension reduction from the preprocessed data using the grey wolf optimization (GWO) algorithm. Then, the Entropy value has estimated from the idyllically selected features. Lastly, an Entropy-Based Graph (EBG) has been constructed to classify data into intrusion or normal data. The experimental results demonstrate that the developed method outperforms other existing methods in various performance measures<span><span>. The detection rate of the developed GWO-EBG is found to be 94.6%, which is higher than 91.24 % of EBG, 75.60 % K-Nearest Neighbors (KNN), 73.36 % of Support Vector Machine<span> (SVM), and 74.88 % of Generalized Regression Neural Network (GRNN) on 5000 connection vectors data obtained from KDD CUP’99 </span></span>testing dataset. The false-positive rate of developed strategy (GWO-EBG) is 0.35 %%, which is lower than 2.18 % of EBG, 7.32 % KNN, 8.15 % of SVM, and 8.13 % of GRNN with 5000 testing datasets.</span></span></p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"105 ","pages":"Article 104964"},"PeriodicalIF":2.6,"publicationDate":"2023-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136010089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-14DOI: 10.1016/j.micpro.2023.104963
Rajesh Kumar Garg , Surender Kumar Soni , S. Vimal , Gaurav Dhiman
In Wireless Sensor Networks, a large number of sensor nodes are distributed in the monitoring area to increase fault tolerance, coverage and communication range. In highly dense network, many nodes belong to common sensing region and record almost similar data of the event. Base station, however, can also identify the event features from data of a few representative nodes of the sensing region. The battery power of some sensor nodes may be saved by not sending multiple copies of the sensed information. In order to reduce transmitting nodes from the sensing region, an analytical model is presented to segregate the whole network into group of correlated regions. The minimum number of transmitting nodes are selected from probability based deployment of sensor nodes in 3D scenario and rest of the nodes are operated in sleep mode for saving the battery power. Effectiveness of proposed models is demonstrated with established technique of CHEF i.e. Cluster Head Election using Fuzzy Logic. Results show that number of nodes transmitting data from sense region can be reduced considerably with respect to threshold correlation value , which results in the energy saving of additional nodes and enhancement of network life. With implementation of proposed models, at , maximum transmitting nodes are 87% which saves battery power of at least 13% nodes.
{"title":"3-D spatial correlation model for reducing the transmitting nodes in densely deployed WSN","authors":"Rajesh Kumar Garg , Surender Kumar Soni , S. Vimal , Gaurav Dhiman","doi":"10.1016/j.micpro.2023.104963","DOIUrl":"https://doi.org/10.1016/j.micpro.2023.104963","url":null,"abstract":"<div><p><span><span>In Wireless Sensor Networks<span>, a large number of sensor nodes are distributed in the monitoring area to increase </span></span>fault tolerance<span><span>, coverage and communication range. In highly dense network, many nodes belong to common sensing region and record almost similar data of the event. Base station<span>, however, can also identify the event features from data of a few representative nodes of the sensing region. The battery power of some sensor nodes may be saved by not sending multiple copies of the sensed information. In order to reduce transmitting nodes from the sensing region, an analytical model is presented to segregate the whole network into group of correlated regions. The minimum number of transmitting nodes are selected from probability based deployment of sensor nodes in 3D scenario and rest of the nodes are operated in sleep mode for saving the battery power. Effectiveness of proposed models is demonstrated with established technique of CHEF i.e. </span></span>Cluster Head Election using Fuzzy Logic. Results show that number of nodes transmitting data from sense region can be reduced considerably with respect to threshold correlation value </span></span><span><math><mrow><mo>(</mo><mi>ξ</mi><mo>)</mo></mrow></math></span><span>, which results in the energy saving of additional nodes and enhancement of network life. With implementation of proposed models, at </span><span><math><mrow><mi>ξ</mi><mspace></mspace><mo>≤</mo><mspace></mspace><mn>0</mn><mo>.</mo><mn>5</mn></mrow></math></span>, maximum transmitting nodes are 87% which saves battery power of at least 13% nodes.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"103 ","pages":"Article 104963"},"PeriodicalIF":2.6,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49738068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-14DOI: 10.1016/j.micpro.2023.104950
Luis Gerardo de la Fraga , Brisbane Ovilla-Martínez , Esteban Tlelo-Cuautle
The implementation of an Echo State Neural Network (ESNN) for chaotic time series prediction is introduced. First, the ESNN is simulated using floating-point arithmetic and afterwards fixed-point arithmetic. The synthesis of the ESNN is done in a field-programmable gate array (FPGA), in which the activation function of the neurons’ outputs is a hyperbolic tangent one, and is approximated with a new design of quadratic order b-splines and four integer multipliers. The FPGA implementation of the ESNN is applied to predict four chaotic time series associated to the Lorenz, Chua, Lü, and Rossler chaotic oscillators. The experimental results show that with 50 hidden neurons, the fixed-point arithmetic is good enough when using 15 or 16 bits in the fractional part: using more bits does not reduce the mean-squared error prediction. The neurons are limited to four inputs in the hidden layer to achieve a more efficient hardware implementation, guaranteeing a prediction of more than 10 steps ahead.
{"title":"Echo state network implementation for chaotic time series prediction","authors":"Luis Gerardo de la Fraga , Brisbane Ovilla-Martínez , Esteban Tlelo-Cuautle","doi":"10.1016/j.micpro.2023.104950","DOIUrl":"https://doi.org/10.1016/j.micpro.2023.104950","url":null,"abstract":"<div><p><span>The implementation of an Echo State Neural Network (ESNN) for chaotic time series prediction is introduced. First, the ESNN is simulated using floating-point arithmetic and afterwards fixed-point arithmetic. The synthesis of the ESNN is done in a field-programmable gate array (FPGA), in which the activation function<span> of the neurons’ outputs is a hyperbolic tangent<span> one, and is approximated with a new design of quadratic order b-splines and four integer multipliers. The FPGA implementation of the ESNN is applied to predict four chaotic time series associated to the Lorenz, Chua, Lü, and Rossler chaotic oscillators. The experimental results show that with 50 hidden neurons, the fixed-point arithmetic is good enough when using 15 or 16 bits in the </span></span></span>fractional part: using more bits does not reduce the mean-squared error prediction. The neurons are limited to four inputs in the hidden layer to achieve a more efficient hardware implementation, guaranteeing a prediction of more than 10 steps ahead.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"103 ","pages":"Article 104950"},"PeriodicalIF":2.6,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49725022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-11DOI: 10.1016/j.micpro.2023.104952
Shu-Yen Lin, Jung-Chuan Chiang
In the development of the neural network (NN), the activation function has become more and more important. The selection of the activation function indirectly affects the convergence speed and accuracy. This study proposes the multi-mode activation function design (MMAFD) based on the least square method (LSM) with a controllable maximum absolute error (MAE) to support multiple activation functions. MMAFD selects the activation function to maintain the accuracy for different deep learning applications. MMAFD is implemented by TSMC 90 nm CMOS technology. In MMAFD, the power consumption is 0.98 mW, the operational frequency is 250 MHz, and the area is 0.416mm². MMAFD is also verified by Xilinx Spartan-6 XC6SLX45 development board. Compared to the related works verified in the FPGA boards, the LUTs and slices registers are reduced by up to 62.96 % and 73.90 %.
{"title":"Low-area architecture design of multi-mode activation functions with controllable maximum absolute error for neural network applications","authors":"Shu-Yen Lin, Jung-Chuan Chiang","doi":"10.1016/j.micpro.2023.104952","DOIUrl":"https://doi.org/10.1016/j.micpro.2023.104952","url":null,"abstract":"<div><p><span><span>In the development of the neural network<span><span> (NN), the activation function has become more and more important. The selection of the activation function indirectly affects the convergence speed and accuracy. This study proposes the multi-mode activation function design (MMAFD) based on the </span>least square method<span> (LSM) with a controllable maximum absolute error<span> (MAE) to support multiple activation functions. MMAFD selects the activation function to maintain the accuracy for different deep learning applications. MMAFD is implemented by TSMC 90 nm CMOS technology. In MMAFD, the </span></span></span></span>power consumption is 0.98 mW, the operational frequency is 250 MHz, and the area is 0.416mm². MMAFD is also verified by Xilinx Spartan-6 XC6SLX45 development board. Compared to the related works verified in the </span>FPGA boards, the LUTs and slices registers are reduced by up to 62.96 % and 73.90 %.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"103 ","pages":"Article 104952"},"PeriodicalIF":2.6,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49725021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-09DOI: 10.1016/j.micpro.2023.104951
Leonardo Faix Pordeus , André Eugenio Lazzaretti , Robson Ribeiro Linhares , Jean Marcelo Simão
The Notification Oriented Paradigm (NOP) emerges as an alternative to develop and execute applications. The NOP brings a new inference concept based on precise notifying collaborative minimal entities. This inference implicitly allows achieving decoupled solutions, thereby enabling parallelism at a granularity level as fine-grained as possible in the envisaged computational platform. Previous research has proposed a digital circuit solution based on the NOP model, which is called NOP to Digital Hardware (DH), as a sort of High-Level Synthesis (HLS) prototype tool. The results with NOP-DH were encouraging indeed. However, the previous NOP-DH works lack benchmarks that exploit well-known algorithms against known HLS tools, such as the Vivado HLS tool, which is one of the suitable commercial HLS solutions. This work proposes evaluating the NOP-DH applied to develop the well-known Random Forest algorithm. The Random Forest is a popular Machine Learning algorithm used in several classification and regression applications. Due to the high number of logic-causal evaluations in the Random Forest algorithm and the possibility of running them in parallel, it is suitable for envisaged benchmark purpose. Experiments were performed to compare NOP-DH, and two Vivado HLS approaches (an ad hoc code and a hls4ml tool-based code) in terms of performance, amount of logic elements, maximum frequency, and the number of predictions per second. Those experiments demonstrated that NOP-DH circuits achieve better results concerning the number of logical elements and prediction rates, with some scalability limitations as a drawback. On average, the NOP-DH uses 52.5% fewer resources, and the number of predictions per second is 4.7 times higher than Vivado HLS. Finally, our codes are made publicly available at https://nop.dainf.ct.utfpr.edu.br/nop-public/nop-dh-random-forest-algorithm.
{"title":"Notification Oriented Paradigm to Digital Hardware — A benchmark evaluation with Random Forest algorithm","authors":"Leonardo Faix Pordeus , André Eugenio Lazzaretti , Robson Ribeiro Linhares , Jean Marcelo Simão","doi":"10.1016/j.micpro.2023.104951","DOIUrl":"https://doi.org/10.1016/j.micpro.2023.104951","url":null,"abstract":"<div><p><span><span><span>The Notification Oriented Paradigm<span> (NOP) emerges as an alternative to develop and execute applications. The NOP brings a new inference concept based on precise notifying collaborative minimal entities. This inference implicitly allows achieving decoupled solutions, thereby enabling parallelism at a </span></span>granularity level<span> as fine-grained as possible in the envisaged computational platform. Previous research has proposed a digital circuit<span> solution based on the NOP model, which is called NOP to Digital Hardware (DH), as a sort of High-Level Synthesis (HLS) prototype tool. The results with NOP-DH were encouraging indeed. However, the previous NOP-DH works lack benchmarks that exploit well-known algorithms against known HLS tools, such as the Vivado HLS tool, which is one of the suitable commercial HLS solutions. This work proposes evaluating the NOP-DH applied to develop the well-known </span></span></span>Random Forest<span> algorithm. The Random Forest is a popular Machine Learning algorithm used in several classification and regression applications. Due to the high number of logic-causal evaluations in the Random Forest algorithm and the possibility of running them in parallel, it is suitable for envisaged benchmark purpose. Experiments were performed to compare NOP-DH, and two Vivado HLS approaches (an </span></span><em>ad hoc</em> code and a <em>hls4ml</em> tool-based code) in terms of performance, amount of logic elements, maximum frequency, and the number of predictions per second. Those experiments demonstrated that NOP-DH circuits achieve better results concerning the number of logical elements and prediction rates, with some scalability limitations as a drawback. On average, the NOP-DH uses 52.5% fewer resources, and the number of predictions per second is 4.7 times higher than Vivado HLS. Finally, our codes are made publicly available at <span>https://nop.dainf.ct.utfpr.edu.br/nop-public/nop-dh-random-forest-algorithm</span><svg><path></path></svg>.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"103 ","pages":"Article 104951"},"PeriodicalIF":2.6,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49725321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-07DOI: 10.1016/j.micpro.2023.104949
Svein Anders Tunheim , Lei Jiao , Rishad Shafik , Alex Yakovlev , Ole-Christoffer Granmo
The Tsetlin Machine (TM) is a machine learning algorithm based on an ensemble of Tsetlin Automata (TAs) that learns propositional logic expressions from Boolean input features. In this paper, the design and implementation of a Field Programmable Gate Array (FPGA) accelerator based on the Convolutional Tsetlin Machine (CTM) is presented. The accelerator performs classification of two pattern classes in 4 × 4 Boolean images with a 2 × 2 convolution window. Specifically, there are two separate TMs, one per class. Each TM comprises 40 propositional logic formulas, denoted as clauses, which are conjunctions of literals. Include/exclude actions from the TAs determine which literals are included in each clause. The accelerator supports full training, including random patch selection during convolution based on parallel reservoir sampling across all clauses. The design is implemented on a Xilinx Zynq XC7Z020 FPGA platform. With an operating clock speed of 40 MHz, the accelerator achieves a classification rate of 4.4 million images per second with an energy per classification of 0.6 J. The mean test accuracy is 99.9% when trained on the 2-dimensional Noisy XOR dataset with 40% noise in the training labels. To achieve this performance, which is on par with the original software implementation, Linear Feedback Shift Register (LFSR) random number generators of minimum 16 bits are required. The solution demonstrates the core principles of a CTM and can be scaled to operate on multi-class systems for larger images.
{"title":"Convolutional Tsetlin Machine-based Training and Inference Accelerator for 2-D Pattern Classification","authors":"Svein Anders Tunheim , Lei Jiao , Rishad Shafik , Alex Yakovlev , Ole-Christoffer Granmo","doi":"10.1016/j.micpro.2023.104949","DOIUrl":"https://doi.org/10.1016/j.micpro.2023.104949","url":null,"abstract":"<div><p>The Tsetlin Machine (TM) is a machine learning algorithm based on an ensemble of Tsetlin Automata (TAs) that learns propositional logic expressions from Boolean input features. In this paper, the design and implementation of a Field Programmable Gate Array (FPGA) accelerator based on the Convolutional Tsetlin Machine (CTM) is presented. The accelerator performs classification of two pattern classes in 4 × 4 Boolean images with a 2 × 2 convolution window. Specifically, there are two separate TMs, one per class. Each TM comprises 40 propositional logic formulas, denoted as clauses, which are conjunctions of literals. Include/exclude actions from the TAs determine which literals are included in each clause. The accelerator supports full training, including random patch selection during convolution based on parallel reservoir sampling across all clauses. The design is implemented on a Xilinx Zynq XC7Z020 FPGA platform. With an operating clock speed of 40 MHz, the accelerator achieves a classification rate of 4.4 million images per second with an energy per classification of 0.6 <span><math><mi>μ</mi></math></span>J. The mean test accuracy is 99.9% when trained on the 2-dimensional Noisy XOR dataset with 40% noise in the training labels. To achieve this performance, which is on par with the original software implementation, Linear Feedback Shift Register (LFSR) random number generators of minimum 16 bits are required. The solution demonstrates the core principles of a CTM and can be scaled to operate on multi-class systems for larger images.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"103 ","pages":"Article 104949"},"PeriodicalIF":2.6,"publicationDate":"2023-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49725134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}