Mark Vousden, Jordan Morris, Graeme McLachlan Bragg, Jonathan Beaumont, Ashur Rafiev, Wayne Luk, David Thomas, Andrew Brown
This paper introduces an event-based computing paradigm, where workers only perform computation in response to external stimuli (events). This approach is best employed on hardware with many thousands of smaller compute cores with a fast, low-latency interconnect, as opposed to traditional computers with fewer and faster cores. Event-based computing is timely because it provides an alternative to traditional big computing, which suffers from immense infrastructural and power costs. This paper presents four case study applications, where an event-based computing approach finds solutions to orders of magnitude more quickly than the equivalent traditional big compute approach, including problems in computational chemistry and condensed matter physics.
{"title":"Event-based high throughput computing: A series of case studies on a massively parallel softcore machine","authors":"Mark Vousden, Jordan Morris, Graeme McLachlan Bragg, Jonathan Beaumont, Ashur Rafiev, Wayne Luk, David Thomas, Andrew Brown","doi":"10.1049/cdt2.12051","DOIUrl":"https://doi.org/10.1049/cdt2.12051","url":null,"abstract":"<p>This paper introduces an event-based computing paradigm, where workers only perform computation in response to external stimuli (events). This approach is best employed on hardware with many thousands of smaller compute cores with a fast, low-latency interconnect, as opposed to traditional computers with fewer and faster cores. Event-based computing is timely because it provides an alternative to traditional big computing, which suffers from immense infrastructural and power costs. This paper presents four case study applications, where an event-based computing approach finds solutions to orders of magnitude more quickly than the equivalent traditional big compute approach, including problems in computational chemistry and condensed matter physics.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"17 1","pages":"29-42"},"PeriodicalIF":1.2,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50137733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A new low-power and high-speed multiplier is presented based on the voltage over scaling (VOS) technique and new 5:3 and 7:3 counter cells. The VOS reduces power consumption in digital circuits, but different voltage levels of the VOS increase the delay in different stages of a multiplier. Hence, the proposed counters are implemented by the gate-diffusion input technique to solve the speed limitation of the VOS-based circuits. The proposed GDI-based 5:3 and 7:3 counters save power and reduce the area by 2x and 2.5x, respectively. To prevent the threshold voltage (Vth) drop in the suggested GDI-based circuits, carbon nanotube field-effect transistor (CNTFET) technology is used. In the counters, the chirality vector and tubes of the CNTFETs are properly adjusted to attain full-swing outputs with high driving capability. Also, their validation against heat distribution under different time intervals, as a major issue in the CNTFET technology is investigated, and their very low sensitivity is confirmed. The low complexity, high stability and efficient performance of the presented counter cells introduce the proposed VOS-CNTFET-GDI-based multiplier as an alternative to the previous designs.
{"title":"Voltage over-scaling CNT-based 8-bit multiplier by high-efficient GDI-based counters","authors":"Ayoub Sadeghi, Nabiollah Shiri, Mahmood Rafiee, Abdolreza Darabi, Ebrahim Abiri","doi":"10.1049/cdt2.12049","DOIUrl":"https://doi.org/10.1049/cdt2.12049","url":null,"abstract":"<p>A new low-power and high-speed multiplier is presented based on the voltage over scaling (VOS) technique and new 5:3 and 7:3 counter cells. The VOS reduces power consumption in digital circuits, but different voltage levels of the VOS increase the delay in different stages of a multiplier. Hence, the proposed counters are implemented by the gate-diffusion input technique to solve the speed limitation of the VOS-based circuits. The proposed GDI-based 5:3 and 7:3 counters save power and reduce the area by 2x and 2.5x, respectively. To prevent the threshold voltage (<i>V</i><sub>th</sub>) drop in the suggested GDI-based circuits, carbon nanotube field-effect transistor (CNTFET) technology is used. In the counters, the chirality vector and tubes of the CNTFETs are properly adjusted to attain full-swing outputs with high driving capability. Also, their validation against heat distribution under different time intervals, as a major issue in the CNTFET technology is investigated, and their very low sensitivity is confirmed. The low complexity, high stability and efficient performance of the presented counter cells introduce the proposed VOS-CNTFET-GDI-based multiplier as an alternative to the previous designs.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"17 1","pages":"1-19"},"PeriodicalIF":1.2,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12049","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50146867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abbas Yaseri, Mohammad Hossein Maghami, Mehdi Radmehr
A high yield estimation is necessary for designing analogue integrated circuits. In the Monte-Carlo (MC) method, many transistor-level simulations should be performed to obtain the desired result. Therefore, some methods are needed to be combined with MC simulations to reach high yield with high speed at the same time. In this paper, a four-stage yield optimisation approach is presented, which employs computational intelligence to accelerate yield estimation without losing accuracy. Firstly, the designs that met the desired characteristics are provided using critical analysis (CA). The aim of utilising CA is to avoid unnecessary MC simulations repeating for non-critical solutions. Then in the second and third stages, the shuffled frog-leaping algorithm and the Non-dominated Sorting Genetic Algorithm-III are proposed to improve the performance. Finally, MC simulations are performed to present the final result. The yield value obtained from the simulation results for two-stage class-AB Operational Transconductance Amplifer (OTA) in 180 nm Complementary Metal-Oxide-Semiconductor (CMOS) technology is 99.85%. The proposed method has less computational effort and high accuracy than the MC-based approaches. Another advantage of using CA is that the initial population of multi-objective optimisation algorithms will no longer be random. Simulation results prove the efficiency of the proposed technique.
{"title":"A four-stage yield optimization technique for analog integrated circuits using optimal computing budget allocation and evolutionary algorithms","authors":"Abbas Yaseri, Mohammad Hossein Maghami, Mehdi Radmehr","doi":"10.1049/cdt2.12048","DOIUrl":"10.1049/cdt2.12048","url":null,"abstract":"<p>A high yield estimation is necessary for designing analogue integrated circuits. In the Monte-Carlo (MC) method, many transistor-level simulations should be performed to obtain the desired result. Therefore, some methods are needed to be combined with MC simulations to reach high yield with high speed at the same time. In this paper, a four-stage yield optimisation approach is presented, which employs computational intelligence to accelerate yield estimation without losing accuracy. Firstly, the designs that met the desired characteristics are provided using critical analysis (CA). The aim of utilising CA is to avoid unnecessary MC simulations repeating for non-critical solutions. Then in the second and third stages, the shuffled frog-leaping algorithm and the Non-dominated Sorting Genetic Algorithm-III are proposed to improve the performance. Finally, MC simulations are performed to present the final result. The yield value obtained from the simulation results for two-stage class-AB Operational Transconductance Amplifer (OTA) in 180 nm Complementary Metal-Oxide-Semiconductor (CMOS) technology is 99.85%. The proposed method has less computational effort and high accuracy than the MC-based approaches. Another advantage of using CA is that the initial population of multi-objective optimisation algorithms will no longer be random. Simulation results prove the efficiency of the proposed technique.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"16 5-6","pages":"183-195"},"PeriodicalIF":1.2,"publicationDate":"2022-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12048","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87528214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kushal K. Ponugoti, Sudarshan K. Srinivasan, Scott C. Smith, Nimish Mathure
With Cyber warfare, detection of hardware Trojans, malicious digital circuit components that can leak data and degrade performance, is an urgent issue. Quasi-Delay Insensitive asynchronous digital circuits, such as NULL Convention Logic (NCL) and Sleep Convention Logic, also known as Multi-Threshold NULL Convention Logic (MTNCL), have inherent security properties and resilience to large fluctuations in temperatures, which make them very alluring to extreme environment applications, such as space exploration, automotive, power industry etc. This paper shows how dual-rail encoding used in NCL and MTNCL can be exploited to design Trojans, which would not be detected using existing methods. Generic threat models for Trojans are given. Formal verification methods that are capable of accurate detection of Trojans at the Register-Transfer-Level are also provided. The detection methods were tested by embedding Trojans in NCL and MTNCL Rivest-Shamir-Adleman (RSA) decryption circuits. The methods were applied to 25 NCL and 25 MTNCL RSA benchmarks of various data path width and provided 100% rate of detection.
{"title":"Illegal Trojan design and detection in asynchronous NULL Convention Logic and Sleep Convention Logic circuits","authors":"Kushal K. Ponugoti, Sudarshan K. Srinivasan, Scott C. Smith, Nimish Mathure","doi":"10.1049/cdt2.12047","DOIUrl":"10.1049/cdt2.12047","url":null,"abstract":"<p>With Cyber warfare, detection of hardware Trojans, malicious digital circuit components that can leak data and degrade performance, is an urgent issue. Quasi-Delay Insensitive asynchronous digital circuits, such as NULL Convention Logic (NCL) and Sleep Convention Logic, also known as Multi-Threshold NULL Convention Logic (MTNCL), have inherent security properties and resilience to large fluctuations in temperatures, which make them very alluring to extreme environment applications, such as space exploration, automotive, power industry etc. This paper shows how dual-rail encoding used in NCL and MTNCL can be exploited to design Trojans, which would not be detected using existing methods. Generic threat models for Trojans are given. Formal verification methods that are capable of accurate detection of Trojans at the Register-Transfer-Level are also provided. The detection methods were tested by embedding Trojans in NCL and MTNCL Rivest-Shamir-Adleman (RSA) decryption circuits. The methods were applied to 25 NCL and 25 MTNCL RSA benchmarks of various data path width and provided 100% rate of detection.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"16 5-6","pages":"172-182"},"PeriodicalIF":1.2,"publicationDate":"2022-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12047","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85767635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shabnam Mahjoub, Mehdi Golsorkhtabaramiri, Seyed Sadegh Salehi Amiri
Due to the design of computer systems in the multi-core and/or multi-processor form, it is possible to use the maximum capacity of processors to run an application with the least time consumed through parallelisation. This is the responsibility of parallel compilers, which perform parallelisation in several steps by distributing iterations between different processors and executing them simultaneously to achieve lower runtime. The present paper focuses on the uniformisation of three-level perfect nested loops as an important step in parallelisation and proposes a method called Towards Three-Level Loop Parallelisation (TLP) that uses a combination of a Frog Leaping Algorithm and Fuzzy to achieve optimal results because in recent years, many algorithms have worked on volumetric data, that is, three-dimensional spaces. Results of the implementation of the TLP algorithm in comparison with existing methods lead to a wide variety of optimal results at desired times, with minimum cone size resulting from the vectors. Besides, the maximum number of input dependence vectors is decomposed by this algorithm. These results can accelerate the process of generating parallel codes and facilitate their development for High-Performance Computing purposes.
{"title":"TLP: Towards three-level loop parallelisation","authors":"Shabnam Mahjoub, Mehdi Golsorkhtabaramiri, Seyed Sadegh Salehi Amiri","doi":"10.1049/cdt2.12046","DOIUrl":"10.1049/cdt2.12046","url":null,"abstract":"<p>Due to the design of computer systems in the multi-core and/or multi-processor form, it is possible to use the maximum capacity of processors to run an application with the least time consumed through parallelisation. This is the responsibility of parallel compilers, which perform parallelisation in several steps by distributing iterations between different processors and executing them simultaneously to achieve lower runtime. The present paper focuses on the uniformisation of three-level perfect nested loops as an important step in parallelisation and proposes a method called Towards Three-Level Loop Parallelisation (TLP) that uses a combination of a Frog Leaping Algorithm and Fuzzy to achieve optimal results because in recent years, many algorithms have worked on volumetric data, that is, three-dimensional spaces. Results of the implementation of the TLP algorithm in comparison with existing methods lead to a wide variety of optimal results at desired times, with minimum cone size resulting from the vectors. Besides, the maximum number of input dependence vectors is decomposed by this algorithm. These results can accelerate the process of generating parallel codes and facilitate their development for High-Performance Computing purposes.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"16 5-6","pages":"159-171"},"PeriodicalIF":1.2,"publicationDate":"2022-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74517978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Geoff V. Merrett, Bernd-Christian Renner, Brandon Lucia
<p>In order to realise the vision and scale of the Internet of Things (IoT), we cannot rely on mains electricity or batteries to power devices due to environmental, maintenance, cost and physical volume implications. Considerable research has been undertaken in energy harvesting, allowing systems to extract electrical energy from their surrounding environments. However, such energy is typically highly dynamic, both spatially and temporally. In recent years, there has been an increase in research around how computing can be effectively performed from energy harvesting supplies, moving beyond the concepts of battery-powered and energy-neutral systems, thus enabling battery-free computing.</p><p>Challenges in battery-free computing are broad and wide-ranging, cutting across the spectrum of electronics and computer science—for example, circuits, algorithms, computer architecture, communication and networking, middleware, applications, deployments, and modelling and simulation tools.</p><p>This special issue explores the challenges, issues and opportunities in the research, design, and engineering of energy-harvesting, energy-neutral and intermittent sensing systems. These are enabling technologies for future applications in smart energy, transportation, environmental monitoring and smart cities. Innovative solutions are needed to enable either uninterrupted or intermittent operation.</p><p>This special issue contains two papers on different aspects of battery-free computing, as described below.</p><p>Hanschke et al.‘s article on ‘EmRep: Energy Management Relying on State-of-Charge Extrema Prediction’ considers energy management in energy-neutral systems, particularly those with small energy storage elements (e.g. a supercapacitor). They observe that existing energy-neutral management approaches have a tendency to operate inefficiently when exposed to extremes in the harvesting environment, for example, wasting harvested power in times of abundant energy due to saturation of the energy storage device. To resolve this, the authors present an approach to predict extremes in device state-of-charge (SoC) when such conditions are occurring and hence switch to a less conservative and more immediate policy for device activity (and hence, consumption). This decouples energy management of high-intake from low-intake harvest periods and ensures that the saturation of energy storage is reduced by design. The approach is thoroughly experimentally evaluated in combination with a variety of different prediction algorithms, time resolutions, and energy storage sizes. Promising results indicate the potential for a doubling in effective utility in systems with only small energy storage elements.</p><p>The second paper in the special issue, authored by Stricker et al., continues the theme of energy prediction by considering the impact of harvesting source prediction errors on the system scheduler and hence the system's performance. Their article, ‘Robustness of Predict
{"title":"Guest Editorial: Special issue on battery-free computing","authors":"Geoff V. Merrett, Bernd-Christian Renner, Brandon Lucia","doi":"10.1049/cdt2.12043","DOIUrl":"10.1049/cdt2.12043","url":null,"abstract":"<p>In order to realise the vision and scale of the Internet of Things (IoT), we cannot rely on mains electricity or batteries to power devices due to environmental, maintenance, cost and physical volume implications. Considerable research has been undertaken in energy harvesting, allowing systems to extract electrical energy from their surrounding environments. However, such energy is typically highly dynamic, both spatially and temporally. In recent years, there has been an increase in research around how computing can be effectively performed from energy harvesting supplies, moving beyond the concepts of battery-powered and energy-neutral systems, thus enabling battery-free computing.</p><p>Challenges in battery-free computing are broad and wide-ranging, cutting across the spectrum of electronics and computer science—for example, circuits, algorithms, computer architecture, communication and networking, middleware, applications, deployments, and modelling and simulation tools.</p><p>This special issue explores the challenges, issues and opportunities in the research, design, and engineering of energy-harvesting, energy-neutral and intermittent sensing systems. These are enabling technologies for future applications in smart energy, transportation, environmental monitoring and smart cities. Innovative solutions are needed to enable either uninterrupted or intermittent operation.</p><p>This special issue contains two papers on different aspects of battery-free computing, as described below.</p><p>Hanschke et al.‘s article on ‘EmRep: Energy Management Relying on State-of-Charge Extrema Prediction’ considers energy management in energy-neutral systems, particularly those with small energy storage elements (e.g. a supercapacitor). They observe that existing energy-neutral management approaches have a tendency to operate inefficiently when exposed to extremes in the harvesting environment, for example, wasting harvested power in times of abundant energy due to saturation of the energy storage device. To resolve this, the authors present an approach to predict extremes in device state-of-charge (SoC) when such conditions are occurring and hence switch to a less conservative and more immediate policy for device activity (and hence, consumption). This decouples energy management of high-intake from low-intake harvest periods and ensures that the saturation of energy storage is reduced by design. The approach is thoroughly experimentally evaluated in combination with a variety of different prediction algorithms, time resolutions, and energy storage sizes. Promising results indicate the potential for a doubling in effective utility in systems with only small energy storage elements.</p><p>The second paper in the special issue, authored by Stricker et al., continues the theme of energy prediction by considering the impact of harvesting source prediction errors on the system scheduler and hence the system's performance. Their article, ‘Robustness of Predict","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"16 4","pages":"89-90"},"PeriodicalIF":1.2,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12043","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77386084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Ali Ramazanzadeh, Behnam Barzegar, Homayun Motameni
The evolution of technology has led to the appearance of smart cities. An essential element in such cities is smart mobility that covers the subjects related to Intelligent Transportation Systems (ITS). The problem is that the ITS vulnerabilities may considerably harm the life quality and safety status of human beings living in smart cities. In fact, software and hardware systems are more exposed to security risks and threats. To reduce threats and secure software design, threat modelling has been proposed as a preventive solution in the software design phase. On the other hand, threat modelling is always criticised for being time consuming, complex, difficult, and error prone. The approach proposed in this study, that is, Automated Security Assistant of Threat Models (ASATM), is an automated solution that is capable of achieving a high level of security assurance. By defining concepts and conceptual modelling as well as implementing automated security assistant algorithms, ASATM introduces a new approach to identifying threats, extracting security requirements, and designing secure software. The proposed approach demonstrates a quantitative classification of security at three levels (insecure, secure, and threat), twelve sub-levels (nominal scale and colour scale), and a five-layer depth (human understandability and conditional probability). In this study, to evaluate the effectiveness of our approach, an example with various security parameters and scenarios was tested and the results confirmed the superiority of the proposed approach over the latest threat modelling approaches in terms of method, learning, and model understanding.
{"title":"ASATM: Automated security assistant of threat models in intelligent transportation systems","authors":"Mohammad Ali Ramazanzadeh, Behnam Barzegar, Homayun Motameni","doi":"10.1049/cdt2.12045","DOIUrl":"10.1049/cdt2.12045","url":null,"abstract":"<p>The evolution of technology has led to the appearance of smart cities. An essential element in such cities is smart mobility that covers the subjects related to Intelligent Transportation Systems (ITS). The problem is that the ITS vulnerabilities may considerably harm the life quality and safety status of human beings living in smart cities. In fact, software and hardware systems are more exposed to security risks and threats. To reduce threats and secure software design, threat modelling has been proposed as a preventive solution in the software design phase. On the other hand, threat modelling is always criticised for being time consuming, complex, difficult, and error prone. The approach proposed in this study, that is, Automated Security Assistant of Threat Models (ASATM), is an automated solution that is capable of achieving a high level of security assurance. By defining concepts and conceptual modelling as well as implementing automated security assistant algorithms, ASATM introduces a new approach to identifying threats, extracting security requirements, and designing secure software. The proposed approach demonstrates a quantitative classification of security at three levels (insecure, secure, and threat), twelve sub-levels (nominal scale and colour scale), and a five-layer depth (human understandability and conditional probability). In this study, to evaluate the effectiveness of our approach, an example with various security parameters and scenarios was tested and the results confirmed the superiority of the proposed approach over the latest threat modelling approaches in terms of method, learning, and model understanding.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"16 5-6","pages":"141-158"},"PeriodicalIF":1.2,"publicationDate":"2022-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12045","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76261435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reducing energy consumption under processors' temperature constraints has recently become a pressing issue in real-time multiprocessor systems on chips (MPSoCs). The high temperature of processors affects the power and reliability of the MPSoC. Low energy consumption is necessary for real-time embedded systems, as most of them are portable devices. Efficient task mapping on processors has a significant impact on reducing energy consumption and the thermal profile of processors. Several state-of-the-art techniques have recently been proposed for this issue. This paper proposes Q-scheduler, a novel technique based on the deep Q-learning technology, to dispatch tasks between processors in a real-time MPSoC. Thousands of simulated tasks train Q-scheduler offline to reduce the system's power consumption under temperature constraints of processors. The trained Q-scheduler dispatches real tasks in a real-time MPSoC online while also being trained regularly online. Q-scheduler dispatches multiple tasks in the system simultaneously with a single process; the effectiveness of this ability is significant, especially in a harmonic real-time system. Experimental results illustrate that Q-scheduler reduces energy consumption and temperature of processors on average by 15% and 10%, respectively, compared to previous state-of-the-art techniques.
{"title":"Q-scheduler: A temperature and energy-aware deep Q-learning technique to schedule tasks in real-time multiprocessor embedded systems","authors":"Mahsa Mohammadi, Hakem Beitollahi","doi":"10.1049/cdt2.12044","DOIUrl":"10.1049/cdt2.12044","url":null,"abstract":"<p>Reducing energy consumption under processors' temperature constraints has recently become a pressing issue in real-time multiprocessor systems on chips (MPSoCs). The high temperature of processors affects the power and reliability of the MPSoC. Low energy consumption is necessary for real-time embedded systems, as most of them are portable devices. Efficient task mapping on processors has a significant impact on reducing energy consumption and the thermal profile of processors. Several state-of-the-art techniques have recently been proposed for this issue. This paper proposes Q-scheduler, a novel technique based on the deep Q-learning technology, to dispatch tasks between processors in a real-time MPSoC. Thousands of simulated tasks train Q-scheduler offline to reduce the system's power consumption under temperature constraints of processors. The trained Q-scheduler dispatches real tasks in a real-time MPSoC online while also being trained regularly online. Q-scheduler dispatches multiple tasks in the system simultaneously with a single process; the effectiveness of this ability is significant, especially in a harmonic real-time system. Experimental results illustrate that Q-scheduler reduces energy consumption and temperature of processors on average by 15% and 10%, respectively, compared to previous state-of-the-art techniques.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"16 4","pages":"125-140"},"PeriodicalIF":1.2,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12044","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79367644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Internet of Things (IoT) systems can rely on energy harvesting to extend battery lifetimes or even render batteries obsolete. Such systems employ an energy scheduler to optimise their behaviour and thus performance by adapting the system's operation. Predictive models of harvesting sources, which are inherently non-deterministic and consequently challenging to predict, are often necessary for the scheduler to optimise performance. Because the inaccurate predictions are utilised by the scheduler, the predictive model's accuracy inevitably impacts the scheduler and system performance. This fact has largely been overlooked in the vast amount of available results on energy schedulers and predictors for harvesting-based systems. The authors systematically describe the effect prediction errors have on the scheduler and thus system performance by defining a novel robustness metric. To alleviate the severe impact prediction errors can have on the system performance, the authors propose an adaptive prediction scaling method that learns from the local environment and system behaviour. The authors demonstrate the concept of robustness with datasets from both outdoor and indoor scenarios. In addition, the authors highlight the improvement and overhead of the proposed adaptive prediction scaling method for both scenarios. It improves a non-robust system's performance by up to 13.8 times in a real-world setting.
{"title":"Robustness of predictive energy harvesting systems: Analysis and adaptive prediction scaling","authors":"Naomi Stricker, Reto Da Forno, Lothar Thiele","doi":"10.1049/cdt2.12042","DOIUrl":"10.1049/cdt2.12042","url":null,"abstract":"<p>Internet of Things (IoT) systems can rely on energy harvesting to extend battery lifetimes or even render batteries obsolete. Such systems employ an energy scheduler to optimise their behaviour and thus performance by adapting the system's operation. Predictive models of harvesting sources, which are inherently non-deterministic and consequently challenging to predict, are often necessary for the scheduler to optimise performance. Because the inaccurate predictions are utilised by the scheduler, the predictive model's accuracy inevitably impacts the scheduler and system performance. This fact has largely been overlooked in the vast amount of available results on energy schedulers and predictors for harvesting-based systems. The authors systematically describe the effect prediction errors have on the scheduler and thus system performance by defining a novel robustness metric. To alleviate the severe impact prediction errors can have on the system performance, the authors propose an adaptive prediction scaling method that learns from the local environment and system behaviour. The authors demonstrate the concept of robustness with datasets from both outdoor and indoor scenarios. In addition, the authors highlight the improvement and overhead of the proposed adaptive prediction scaling method for both scenarios. It improves a non-robust system's performance by up to 13.8 times in a real-world setting.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"16 4","pages":"106-124"},"PeriodicalIF":1.2,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12042","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83346208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ashur Rafiev, Alex Yakovlev, Ghaith Tarawneh, Matthew F. Naylor, Simon W. Moore, David B. Thomas, Graeme M. Bragg, Mark L. Vousden, Andrew D. Brown
One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message-passing and global synchronization are powerful abstractions in this regard, especially when used in combination. This paper studies the use of a hardware-implemented refutable global barrier as a design optimization technique aimed at unifying these abstractions at the API level. The paper explores the trade-offs between the related overheads and performance factors on a message-passing prototype machine with 49,152 RISC-V threads distributed over 48 FPGAs (called the Partially Ordered Event-Triggered Systems platform). Our experiments show that some graph applications favour synchronized communication, but the effect is hard to predict in general because of the interplay between multiple hardware and software factors. A classifier model is therefore proposed and implemented to perform such a prediction based on the application graph topology parameters: graph diameter, degree of connectivity, and reconvergence metric. The presented experimental results demonstrate that the correct choice of communication mode, granted by the new model-driven approach, helps to achieve 3.22 times faster computation time on average compared to the baseline platform operation.
{"title":"Synchronization in graph analysis algorithms on the Partially Ordered Event-Triggered Systems many-core architecture","authors":"Ashur Rafiev, Alex Yakovlev, Ghaith Tarawneh, Matthew F. Naylor, Simon W. Moore, David B. Thomas, Graeme M. Bragg, Mark L. Vousden, Andrew D. Brown","doi":"10.1049/cdt2.12041","DOIUrl":"10.1049/cdt2.12041","url":null,"abstract":"<p>One of the key problems in designing and implementing graph analysis algorithms for distributed platforms is to find an optimal way of managing communication flows in the massively parallel processing network. Message-passing and global synchronization are powerful abstractions in this regard, especially when used in combination. This paper studies the use of a hardware-implemented refutable global barrier as a design optimization technique aimed at unifying these abstractions at the API level. The paper explores the trade-offs between the related overheads and performance factors on a message-passing prototype machine with 49,152 RISC-V threads distributed over 48 FPGAs (called the Partially Ordered Event-Triggered Systems platform). Our experiments show that some graph applications favour synchronized communication, but the effect is hard to predict in general because of the interplay between multiple hardware and software factors. A classifier model is therefore proposed and implemented to perform such a prediction based on the application graph topology parameters: graph diameter, degree of connectivity, and reconvergence metric. The presented experimental results demonstrate that the correct choice of communication mode, granted by the new model-driven approach, helps to achieve 3.22 times faster computation time on average compared to the baseline platform operation.</p>","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"16 2-3","pages":"71-88"},"PeriodicalIF":1.2,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12041","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85073025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}