Pub Date : 2024-09-01Epub Date: 2024-07-19DOI: 10.1016/j.micpro.2024.105085
Raimarius Delgado , Se Yeon Cho , Byoung Wook Choi
The increasing demand for rapid application development tools, especially those employing high-level languages such as Python, has underscored the importance of utilizing a wide array of popular libraries while addressing real-time constraints in distributed hardware systems. This paper introduces PyIgH, a unified architecture of an IgH EtherCAT master based on Python, specifically designed to satisfy hard real-time requirements in an EtherCAT network. Implemented as a Python module, PyIgH exposes the functionalities and capabilities of an open-source EtherCAT master, facilitating seamless configuration and control of EtherCAT slave devices within the Python runtime environment. Real-time adaptation of the POSIX library, encapsulated within Python, is also utilized to satisfy the timing requirements of EtherCAT. The feasibility of the proposed approach is verified by analyzing the real-time performance in terms of periodicity and in-controller delay of the EtherCAT control task with a 1 kHz cycle. Experimental results demonstrate that PyIgH is suitable for hard real-time applications and serves as a valid alternative to conventional low-level EtherCAT masters. Additionally, a practical application involving motion control of a six-axis collaborative robot showcases consistent performance of PyIgH within a real-time multi-tasking environment.
{"title":"PyIgH : A unified architecture of IgH EtherCAT Master based on Python considering hard real-time constraints","authors":"Raimarius Delgado , Se Yeon Cho , Byoung Wook Choi","doi":"10.1016/j.micpro.2024.105085","DOIUrl":"10.1016/j.micpro.2024.105085","url":null,"abstract":"<div><p>The increasing demand for rapid application development tools, especially those employing high-level languages such as Python, has underscored the importance of utilizing a wide array of popular libraries while addressing real-time constraints in distributed hardware systems. This paper introduces PyIgH, a unified architecture of an IgH EtherCAT master based on Python, specifically designed to satisfy hard real-time requirements in an EtherCAT network. Implemented as a Python module, PyIgH exposes the functionalities and capabilities of an open-source EtherCAT master, facilitating seamless configuration and control of EtherCAT slave devices within the Python runtime environment. Real-time adaptation of the POSIX library, encapsulated within Python, is also utilized to satisfy the timing requirements of EtherCAT. The feasibility of the proposed approach is verified by analyzing the real-time performance in terms of periodicity and in-controller delay of the EtherCAT control task with a 1 kHz cycle. Experimental results demonstrate that PyIgH is suitable for hard real-time applications and serves as a valid alternative to conventional low-level EtherCAT masters. Additionally, a practical application involving motion control of a six-axis collaborative robot showcases consistent performance of PyIgH within a real-time multi-tasking environment.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105085"},"PeriodicalIF":1.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01Epub Date: 2024-06-28DOI: 10.1016/j.micpro.2024.105083
Qurat-ul Ain, Osman Hasan
Due to the continuous reduction in the transistors sizing ruled by the Moore’s law, digital devices have become smaller, and more complex resulting in an enormous rise in the delay variations. Therefore, there is a dire need of precise and rigorous timing analysis to overcome anomalies during the timing analysis. Timings of digital circuits can be verified using various simulation or static timing analysis (STA) based tools but they provide estimated results due to their inherent in-exhaustive nature or report timing paths corresponding to non-existent functional paths, respectively. Formal verification provides complete and sound analysis results and has widely been used for the functional verification of digital circuits but its application in the timing analysis domain is somewhat limited. We present a generic framework to perform formal timing analysis of digital circuits with the help of Uppaal model-checker. The given digital circuit along with its timing parameters in the form of state transition diagram are modeled using timed automata in the Uppaal model checker. Timing delays are calculated from corresponding technology parameters, and Quartus Prime Pro is used to obtain the information about the circuits’ paths. In order to make the analysis scalable, we also propose a novel path partitioning technique and compare its results with complete path analysis and traditional STA. The formal model is verified with the help of properties to assess the timing characteristics, like time period of a clock, critical path, and propagation delay of the considered circuit. Modeling and verification of ISCAS-85 and ISCAS-89 benchmark circuits is presented for illustration purposes.
由于摩尔定律规定的晶体管尺寸不断缩小,数字设备变得越来越小、越来越复杂,导致延迟变化大幅上升。因此,亟需进行精确、严格的时序分析,以克服时序分析过程中的异常现象。数字电路的时序可使用各种基于仿真或静态时序分析 (STA) 的工具进行验证,但由于其固有的不穷尽性,这些工具只能提供估计结果,或分别报告与不存在的功能路径相对应的时序路径。形式验证可提供完整、可靠的分析结果,已广泛用于数字电路的功能验证,但在时序分析领域的应用却受到一定限制。在 Uppaal 模型检查器的帮助下,我们提出了一个对数字电路进行形式时序分析的通用框架。在 Uppaal 模型检查器中,使用定时自动机对给定的数字电路及其状态转换图形式的时序参数进行建模。根据相应的技术参数计算时序延迟,并使用 Quartus Prime Pro 获取电路路径信息。为了使分析具有可扩展性,我们还提出了一种新颖的路径分割技术,并将其结果与完整路径分析和传统的 STA 进行了比较。正式模型借助属性进行验证,以评估所考虑电路的时序特性,如时钟周期、临界路径和传播延迟。为说明起见,介绍了 ISCAS-85 和 ISCAS-89 基准电路的建模和验证。
{"title":"Formal timing analysis of gate-level digital circuits using model checking","authors":"Qurat-ul Ain, Osman Hasan","doi":"10.1016/j.micpro.2024.105083","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105083","url":null,"abstract":"<div><p>Due to the continuous reduction in the transistors sizing ruled by the Moore’s law, digital devices have become smaller, and more complex resulting in an enormous rise in the delay variations. Therefore, there is a dire need of precise and rigorous timing analysis to overcome anomalies during the timing analysis. Timings of digital circuits can be verified using various simulation or static timing analysis (STA) based tools but they provide estimated results due to their inherent in-exhaustive nature or report timing paths corresponding to non-existent functional paths, respectively. Formal verification provides complete and sound analysis results and has widely been used for the functional verification of digital circuits but its application in the timing analysis domain is somewhat limited. We present a generic framework to perform formal timing analysis of digital circuits with the help of Uppaal model-checker. The given digital circuit along with its timing parameters in the form of state transition diagram are modeled using timed automata in the Uppaal model checker. Timing delays are calculated from corresponding technology parameters, and Quartus Prime Pro is used to obtain the information about the circuits’ paths. In order to make the analysis scalable, we also propose a novel path partitioning technique and compare its results with complete path analysis and traditional STA. The formal model is verified with the help of properties to assess the timing characteristics, like time period of a clock, critical path, and propagation delay of the considered circuit. Modeling and verification of ISCAS-85 and ISCAS-89 benchmark circuits is presented for illustration purposes.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105083"},"PeriodicalIF":1.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01Epub Date: 2024-07-14DOI: 10.1016/j.micpro.2024.105084
Andrea Fernández Gallego, Miguel Jiménez Arribas, Iván Gamino del Río, Agustín Martínez Hellín, Manuel Prieto Mateo, Óscar Rodríguez Polo, Antonio da Silva, Pablo Parra, Sebastián Sánchez
RISC-V is a computer architecture that has recently attracted considerable attention due to its advantageous qualities: it is an open instruction set, based on reduced and simple instructions. For this reason it has become an appealing choice for a wide range of computing applications and has positioned it as a disruptive force in a wide variety of fields, including those that involve the development of safety–critical software, as in the space sector. The ability to evaluate the activities performed within a processor is of paramount importance in this type of systems to ensure the fulfillment of the requirements during space missions. The monitoring of these events inside the processor is managed by an instrument called Hardware Performance Monitor (HPM). This work shows the implementation of the Sscofpmf extension of the HPM compliant to the RISC-V privileged specification. The paper details the redesign of the existing performance counters from a RISC-V baseline version previously implemented. A comparison between the two versions of both resource utilization data and power consumption is also provided. As expected, the Sscofpmf extension version has a higher resource utilization. Nevertheless, the paper shows that the additional functionalities included in the system have been validated without any changes in the processor clock frequency, so the extension does not introduce any performance overhead.
{"title":"Count overflow and privilege mode filtering extension implementation on a RISC-V on-board processor","authors":"Andrea Fernández Gallego, Miguel Jiménez Arribas, Iván Gamino del Río, Agustín Martínez Hellín, Manuel Prieto Mateo, Óscar Rodríguez Polo, Antonio da Silva, Pablo Parra, Sebastián Sánchez","doi":"10.1016/j.micpro.2024.105084","DOIUrl":"10.1016/j.micpro.2024.105084","url":null,"abstract":"<div><p><em>RISC-V</em> is a computer architecture that has recently attracted considerable attention due to its advantageous qualities: it is an open instruction set, based on reduced and simple instructions. For this reason it has become an appealing choice for a wide range of computing applications and has positioned it as a disruptive force in a wide variety of fields, including those that involve the development of safety–critical software, as in the space sector. The ability to evaluate the activities performed within a processor is of paramount importance in this type of systems to ensure the fulfillment of the requirements during space missions. The monitoring of these events inside the processor is managed by an instrument called Hardware Performance Monitor (HPM). This work shows the implementation of the <em>Sscofpmf</em> extension of the HPM compliant to the <em>RISC-V</em> privileged specification. The paper details the redesign of the existing performance counters from a <em>RISC-V</em> baseline version previously implemented. A comparison between the two versions of both resource utilization data and power consumption is also provided. As expected, the <em>Sscofpmf</em> extension version has a higher resource utilization. Nevertheless, the paper shows that the additional functionalities included in the system have been validated without any changes in the processor clock frequency, so the extension does not introduce any performance overhead.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105084"},"PeriodicalIF":1.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000796/pdfft?md5=db2cd71fd8fabeee87eb0b479d1b76cc&pid=1-s2.0-S0141933124000796-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141699792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01Epub Date: 2024-07-17DOI: 10.1016/j.micpro.2024.105086
Cemil Keskinoğlu , Ahmet Aydın
People must move using their lower and upper extremities to complete their work. Depending on these extremities' using frequency or different effects such as age, genetics, and body weight, the extremities' ability may decrease. The joints' range of motion(ROM) is measured to evaluate this decrease. Different systems, such as conventional goniometers, mobile phone applications, and sensor-based systems, can measure the ROM value. Still, it can be challenging to measure this parameter in different situations, such as training, moving activities, etc. The partial wireless goniometer and a companion 3D visualization and control GUI were developed in our previous study. However, it was difficult to mount it on the limbs at a distance, or it was impossible to use it for both legs to measure the hip angles. Therefore, this study presents a full wireless goniometer system that can simultaneously measure in real-time and show joint movements in a 3D model for the upper and lower extremities. The angle values required for the ROM were measured with two IMU sensors. Two ESP32s were used as microcontrollers in the system, and a fully wireless system was enabled by transferring data via ESP-NOW and Bluetooth. Thanks to ESP-NOW, the system has less latency compared to other protocols and can transmit data over longer distances. The developed system can also perform activity recognition which is not available in other goniometers. The measurements of the system were compared with a conventional goniometer, and their results were found to be completely correlated .
{"title":"Full wireless goniometer design with activity recognition for upper and lower limb","authors":"Cemil Keskinoğlu , Ahmet Aydın","doi":"10.1016/j.micpro.2024.105086","DOIUrl":"10.1016/j.micpro.2024.105086","url":null,"abstract":"<div><p>People must move using their lower and upper extremities to complete their work. Depending on these extremities' using frequency or different effects such as age, genetics, and body weight, the extremities' ability may decrease. The joints' range of motion(ROM) is measured to evaluate this decrease. Different systems, such as conventional goniometers, mobile phone applications, and sensor-based systems, can measure the ROM value. Still, it can be challenging to measure this parameter in different situations, such as training, moving activities, etc. The partial wireless goniometer and a companion 3D visualization and control GUI were developed in our previous study. However, it was difficult to mount it on the limbs at a distance, or it was impossible to use it for both legs to measure the hip angles. Therefore, this study presents a full wireless goniometer system that can simultaneously measure in real-time and show joint movements in a 3D model for the upper and lower extremities. The angle values required for the ROM were measured with two IMU sensors. Two ESP32s were used as microcontrollers in the system, and a fully wireless system was enabled by transferring data via ESP-NOW and Bluetooth. Thanks to ESP-NOW, the system has less latency compared to other protocols and can transmit data over longer distances. The developed system can also perform activity recognition which is not available in other goniometers. The measurements of the system were compared with a conventional goniometer, and their results were found to be completely correlated <span><math><mrow><mo>(</mo><mrow><msub><mi>ρ</mi><mi>c</mi></msub><mo>=</mo><mn>1</mn></mrow><mo>)</mo></mrow></math></span>.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105086"},"PeriodicalIF":1.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141775424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01Epub Date: 2024-07-30DOI: 10.1016/j.micpro.2024.105087
Dennis Pinto, Jose-María Arnau, Marc Riera, Josep-Llorenç Cruz, Antonio González
Deep Neural Networks (DNNs) are widely used in many application domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our predictor, named Mixture-of-Rookies, combines two inexpensive components. The first one exploits the high linear correlation between binarized (1-bit) and full-precision (8-bit) dot products, whereas the second component clusters together neurons that tend to output zero at the same time. We propose a novel clustering scheme based on analysis of angles, as the sign of the dot product of two vectors depends on the cosine of the angle between them. We implement our hybrid zero output predictor on top of a state-of-the-art DNN accelerator. Experimental results show that our scheme introduces a small area overhead of 5.3% while achieving a speedup of 1.2x and reducing energy consumption by 16.5% on average for a set of diverse DNNs.
{"title":"Mixture-of-Rookies: Saving DNN computations by predicting ReLU outputs","authors":"Dennis Pinto, Jose-María Arnau, Marc Riera, Josep-Llorenç Cruz, Antonio González","doi":"10.1016/j.micpro.2024.105087","DOIUrl":"10.1016/j.micpro.2024.105087","url":null,"abstract":"<div><p>Deep Neural Networks (DNNs) are widely used in many application domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our predictor, named <em>Mixture-of-Rookies</em>, combines two inexpensive components. The first one exploits the high linear correlation between binarized (1-bit) and full-precision (8-bit) dot products, whereas the second component clusters together neurons that tend to output zero at the same time. We propose a novel clustering scheme based on analysis of angles, as the sign of the dot product of two vectors depends on the cosine of the angle between them. We implement our hybrid zero output predictor on top of a state-of-the-art DNN accelerator. Experimental results show that our scheme introduces a small area overhead of 5.3% while achieving a speedup of 1.2x and reducing energy consumption by 16.5% on average for a set of diverse DNNs.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105087"},"PeriodicalIF":1.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000826/pdfft?md5=f3e30ee4d950e1c93554e32d04ba1b80&pid=1-s2.0-S0141933124000826-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-01Epub Date: 2024-06-13DOI: 10.1016/j.micpro.2024.105082
Anusaka Gon, Atin Mukherjee
Sudden cardiac arrest (SCA) results in an unexpected and untimely death within minutes, and its early prediction can alert cardiac patients to a timely medical diagnosis. To detect early symptoms of an SCA, the detection and classification of ventricular tachycardias (VT) are of utmost importance. In this work, a low-area yet highly accurate hardware architecture for VT classification is proposed based on the detection of premature ventricular contraction (PVC) beats. After pre-processing of the ECG signals using a wavelet-based pre-processing unit, a characteristics-matching algorithm is used to detect the PVC beats, and a low-complexity adaptive decision-based logic classifier is used to classify them into four types of VTs, namely monomorphic, polymorphic, non-sustained VT (NSVT), and sustained VT (SVT). FPGA verification of the hardware architecture for the VT classifier using the Nexys 4 DDR Artix-7 board utilizes 10.4 % of the total available resources and displays the type of VT and the number of PVCs detected to help in determining the severity of SCA and the need for medical attention. The ASIC implementation of the proposed PVC-based VT classification using the SCL 180 nm CMOS technology results in an area overhead of 0.02 mm2 and a power consumption of 3.47 μW for a high accuracy rate of 98.2 %. When compared to the existing CA detection systems for wearable devices, the proposed one consumes the least area while achieving high detection rates.
{"title":"Design of a low-area hardware architecture to predict early signs of sudden cardiac arrests","authors":"Anusaka Gon, Atin Mukherjee","doi":"10.1016/j.micpro.2024.105082","DOIUrl":"10.1016/j.micpro.2024.105082","url":null,"abstract":"<div><p>Sudden cardiac arrest (SCA) results in an unexpected and untimely death within minutes, and its early prediction can alert cardiac patients to a timely medical diagnosis. To detect early symptoms of an SCA, the detection and classification of ventricular tachycardias (VT) are of utmost importance. In this work, a low-area yet highly accurate hardware architecture for VT classification is proposed based on the detection of premature ventricular contraction (PVC) beats. After pre-processing of the ECG signals using a wavelet-based pre-processing unit, a characteristics-matching algorithm is used to detect the PVC beats, and a low-complexity adaptive decision-based logic classifier is used to classify them into four types of VTs, namely monomorphic, polymorphic, non-sustained VT (NSVT), and sustained VT (SVT). FPGA verification of the hardware architecture for the VT classifier using the Nexys 4 DDR Artix-7 board utilizes 10.4 % of the total available resources and displays the type of VT and the number of PVCs detected to help in determining the severity of SCA and the need for medical attention. The ASIC implementation of the proposed PVC-based VT classification using the SCL 180 nm CMOS technology results in an area overhead of 0.02 mm<sup>2</sup> and a power consumption of 3.47 μW for a high accuracy rate of 98.2 %. When compared to the existing CA detection systems for wearable devices, the proposed one consumes the least area while achieving high detection rates.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105082"},"PeriodicalIF":1.9,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141414119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The objective of this work is the design of a technological platform for remote monitoring of patients with Chronic Obstructive Pulmonary Disease (COPD). The concept of the framework is a breakthrough in the state of medical, scientific and technological art, aimed at engaging patients in the treatment plan and supporting interaction with healthcare professionals. The proposed platform is able to support a new paradigm for the management of patients with COPD, by integrating clinical data and parameters monitored in daily life using Artificial Intelligence algorithms. Therefore, the doctor is provided with a dynamic picture of the disease and its impact on lifestyle and vice versa, and can thus plan more personalized diagnostics, therapeutics, and social interventions. This strategy allows for a more effective organization of access to outpatient care and therefore a reduction of emergencies and hospitalizations because exacerbations of the disease can be better prevented and monitored. Hence, it can result in improvements in patients’ quality of life and lower costs for the healthcare system.
{"title":"Advancements on IoT and AI applied to Pneumology","authors":"Enrico Cambiaso , Sara Narteni , Ilaria Baiardini , Fulvio Braido , Alessia Paglialonga , Maurizio Mongelli","doi":"10.1016/j.micpro.2024.105062","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105062","url":null,"abstract":"<div><p>The objective of this work is the design of a technological platform for remote monitoring of patients with Chronic Obstructive Pulmonary Disease (COPD). The concept of the framework is a breakthrough in the state of medical, scientific and technological art, aimed at engaging patients in the treatment plan and supporting interaction with healthcare professionals. The proposed platform is able to support a new paradigm for the management of patients with COPD, by integrating clinical data and parameters monitored in daily life using Artificial Intelligence algorithms. Therefore, the doctor is provided with a dynamic picture of the disease and its impact on lifestyle and vice versa, and can thus plan more personalized diagnostics, therapeutics, and social interventions. This strategy allows for a more effective organization of access to outpatient care and therefore a reduction of emergencies and hospitalizations because exacerbations of the disease can be better prevented and monitored. Hence, it can result in improvements in patients’ quality of life and lower costs for the healthcare system.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105062"},"PeriodicalIF":2.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000577/pdfft?md5=04b32d737cc9dd247636adf8505b415a&pid=1-s2.0-S0141933124000577-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-05-11DOI: 10.1016/j.micpro.2024.105061
Uzmat Ul Nisa, Janibul Bashir
For on-chip networks, nanophotonics has been considered a strong alternative owing to its high speed (due to low latency) and high bandwidth (due to wavelength division multiplexing). However, the major hurdle in the adoption of nanophotonic-based on-chip networks is their high static power consumption. Various proposals are there in the literature which try to reduce the static power consumption either by modulating the laser or by allowing the on-chip stations to share the photonic channels. In this paper, we propose OpSAVE— an optical NoC that combines the above two strategies to effectively reduce static power consumption. It proposes a superior prediction mechanism based on the eviction details from the private caches. It explains how shared channels can be used to dynamically balance the load and at the same time handle mispredictions. It allows the optical stations to share both the power and the available bandwidth to increase their utilization. Moreover, OpSAVE proposes to use a double pumping strategy to improve the system performance. We compared our scheme with the state-of-the-art proposals in this domain and the results show that our scheme consumes 4.4X less optical power and at the same time improves the performance by nearly 28%. In the evaluation, we have considered the multicore benchmarks from the Splash and Parsec benchmark suites.
{"title":"OpSAVE: Eviction Based Scheme for Efficient Optical Network-on-Chip","authors":"Uzmat Ul Nisa, Janibul Bashir","doi":"10.1016/j.micpro.2024.105061","DOIUrl":"10.1016/j.micpro.2024.105061","url":null,"abstract":"<div><p>For on-chip networks, nanophotonics has been considered a strong alternative owing to its high speed (due to low latency) and high bandwidth (due to wavelength division multiplexing). However, the major hurdle in the adoption of nanophotonic-based on-chip networks is their high static power consumption. Various proposals are there in the literature which try to reduce the static power consumption either by modulating the laser or by allowing the on-chip stations to share the photonic channels. In this paper, we propose <em>OpSAVE</em>— an optical NoC that combines the above two strategies to effectively reduce static power consumption. It proposes a superior prediction mechanism based on the eviction details from the private caches. It explains how shared channels can be used to dynamically balance the load and at the same time handle mispredictions. It allows the optical stations to share both the power and the available bandwidth to increase their utilization. Moreover, <em>OpSAVE</em> proposes to use a double pumping strategy to improve the system performance. We compared our scheme with the state-of-the-art proposals in this domain and the results show that our scheme consumes 4.4X less optical power and at the same time improves the performance by nearly 28%. In the evaluation, we have considered the multicore benchmarks from the Splash and Parsec benchmark suites.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105061"},"PeriodicalIF":2.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141052214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-05-18DOI: 10.1016/j.micpro.2024.105064
Kasetty Praveen Kumar, Aniruddha Kanhe
In this paper a two stage pipeline architecture for computation of multilevel decomposition of framelet transform is proposed. To handle the problem of perfect reconstruction, an area efficient symmetric extension router is used that duplicates the appropriate number of data samples of input signal at the boundary followed by reflection about the symmetry axis. In addition, to reduce the period and number of clock cycles required for computing the framelet transform, the inter-stage and intrastage pipeline of the computational units is maximized. The inter-stage pipelining is obtained by distributing the various levels of decomposition among the computational units of two stages, and a synchronization mechanism is adopted to reduce the total number of clock cycles. Similarly, the intrastage pipelining is achieved by using the pipeline registers such that the clock period is limited to the delay of multiplier and accumulator (MAC) circuit of the finite-impulse response (FIR) filter. To validate the feasibility and functionality of the proposed hardware architecture, the design is implemented on Artix7 XC7A100TCSG324-1 field-programmable gate array (FPGA) for the case of framelet transform with one low-pass and two high-pass filters. The proposed architecture is able to operate at a maximum clock frequency of 112 MHz.
{"title":"A two stage pipeline architecture for hardware implementation of multi-level decomposition of 1-D framelet transform","authors":"Kasetty Praveen Kumar, Aniruddha Kanhe","doi":"10.1016/j.micpro.2024.105064","DOIUrl":"10.1016/j.micpro.2024.105064","url":null,"abstract":"<div><p>In this paper a two stage pipeline architecture for computation of multilevel decomposition of framelet transform is proposed. To handle the problem of perfect reconstruction, an area efficient symmetric extension router is used that duplicates the appropriate number of data samples of input signal at the boundary followed by reflection about the symmetry axis. In addition, to reduce the period and number of clock cycles required for computing the framelet transform, the inter-stage and intrastage pipeline of the computational units is maximized. The inter-stage pipelining is obtained by distributing the various levels of decomposition among the computational units of two stages, and a synchronization mechanism is adopted to reduce the total number of clock cycles. Similarly, the intrastage pipelining is achieved by using the pipeline registers such that the clock period is limited to the delay of multiplier and accumulator (MAC) circuit of the finite-impulse response (FIR) filter. To validate the feasibility and functionality of the proposed hardware architecture, the design is implemented on Artix7 XC7A100TCSG324-1 field-programmable gate array (FPGA) for the case of framelet transform with one low-pass and two high-pass filters. The proposed architecture is able to operate at a maximum clock frequency of 112 MHz.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105064"},"PeriodicalIF":2.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141138587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01Epub Date: 2024-05-14DOI: 10.1016/j.micpro.2024.105063
Iman Firmansyah , Yoshiki Yamaguchi , Tsutomu Maruyama , Yuta Matsuura , Zhang Heming , Shin Kawai , Hajime Nobuhara
We have proposed a hardware-accelerated drone to analyze the condition of farmland right then and there; as a first step, we report that the proposed system can take crop height measurements with high accuracy using a monocular camera. The proposed three-dimensional farmland is generated using stereo matching, where a drone with a monocular camera can extend the parallax distance as the length between two positions when taking a ground image. This means that our approach can improve the accuracy of a reconstructed 3D farmland. In addition, toward real-time computation and low power consumption, the proposed hardware design accelerates image processing efficiently. Thus, to achieve this, we propose a strategy that combines the semi-global matching (SGM) with single path direction and a sum of absolute difference (SAD) with reduced disparity searching length. For example, a semi-global matching (SGM) was employed to smooth the disparity map result before checking the consistency, where the scan line was performed in one direction, from left to right, to speed up the computation time. The experimental result shows that the computation time performed by Xilinx Zynq ZCU102 FPGA achieves 0.77 s for the stereo data set images with 1536 × 1024 pixels resolution. To meet the real-time application and reduce the FPGA resources toward lower power consumption, the experiment discusses reducing the disparity searching length for the SAD computation. In our experiment, the execution time is less than 40 milliseconds, and the circuit volume is around 9,500 LUTs, equivalent to a small-size FPGA. Finally, we also estimated the object's height; a value of 0.43 m was estimated for the object with a physical height of 0.45 m. Meanwhile, for the object with a physical height of 0.65 m, a value of 0.63 m was estimated.
{"title":"FPGA-based stereo matching for crop height measurement using monocular camera","authors":"Iman Firmansyah , Yoshiki Yamaguchi , Tsutomu Maruyama , Yuta Matsuura , Zhang Heming , Shin Kawai , Hajime Nobuhara","doi":"10.1016/j.micpro.2024.105063","DOIUrl":"10.1016/j.micpro.2024.105063","url":null,"abstract":"<div><p>We have proposed a hardware-accelerated drone to analyze the condition of farmland right then and there; as a first step, we report that the proposed system can take crop height measurements with high accuracy using a monocular camera. The proposed three-dimensional farmland is generated using stereo matching, where a drone with a monocular camera can extend the parallax distance as the length between two positions when taking a ground image. This means that our approach can improve the accuracy of a reconstructed 3D farmland. In addition, toward real-time computation and low power consumption, the proposed hardware design accelerates image processing efficiently. Thus, to achieve this, we propose a strategy that combines the semi-global matching (SGM) with single path direction and a sum of absolute difference (SAD) with reduced disparity searching length. For example, a semi-global matching (SGM) was employed to smooth the disparity map result before checking the consistency, where the scan line was performed in one direction, from left to right, to speed up the computation time. The experimental result shows that the computation time performed by Xilinx Zynq ZCU102 FPGA achieves 0.77 s for the stereo data set images with 1536 × 1024 pixels resolution. To meet the real-time application and reduce the FPGA resources toward lower power consumption, the experiment discusses reducing the disparity searching length for the SAD computation. In our experiment, the execution time is less than 40 milliseconds, and the circuit volume is around 9,500 LUTs, equivalent to a small-size FPGA. Finally, we also estimated the object's height; a value of 0.43 m was estimated for the object with a physical height of 0.45 m. Meanwhile, for the object with a physical height of 0.65 m, a value of 0.63 m was estimated.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105063"},"PeriodicalIF":2.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141053941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}