首页 > 最新文献

Microprocessors and Microsystems最新文献

英文 中文
Formal timing analysis of gate-level digital circuits using model checking 利用模型检查对门级数字电路进行正式时序分析
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-28 DOI: 10.1016/j.micpro.2024.105083
Qurat-ul Ain, Osman Hasan

Due to the continuous reduction in the transistors sizing ruled by the Moore’s law, digital devices have become smaller, and more complex resulting in an enormous rise in the delay variations. Therefore, there is a dire need of precise and rigorous timing analysis to overcome anomalies during the timing analysis. Timings of digital circuits can be verified using various simulation or static timing analysis (STA) based tools but they provide estimated results due to their inherent in-exhaustive nature or report timing paths corresponding to non-existent functional paths, respectively. Formal verification provides complete and sound analysis results and has widely been used for the functional verification of digital circuits but its application in the timing analysis domain is somewhat limited. We present a generic framework to perform formal timing analysis of digital circuits with the help of Uppaal model-checker. The given digital circuit along with its timing parameters in the form of state transition diagram are modeled using timed automata in the Uppaal model checker. Timing delays are calculated from corresponding technology parameters, and Quartus Prime Pro is used to obtain the information about the circuits’ paths. In order to make the analysis scalable, we also propose a novel path partitioning technique and compare its results with complete path analysis and traditional STA. The formal model is verified with the help of properties to assess the timing characteristics, like time period of a clock, critical path, and propagation delay of the considered circuit. Modeling and verification of ISCAS-85 and ISCAS-89 benchmark circuits is presented for illustration purposes.

由于摩尔定律规定的晶体管尺寸不断缩小,数字设备变得越来越小、越来越复杂,导致延迟变化大幅上升。因此,亟需进行精确、严格的时序分析,以克服时序分析过程中的异常现象。数字电路的时序可使用各种基于仿真或静态时序分析 (STA) 的工具进行验证,但由于其固有的不穷尽性,这些工具只能提供估计结果,或分别报告与不存在的功能路径相对应的时序路径。形式验证可提供完整、可靠的分析结果,已广泛用于数字电路的功能验证,但在时序分析领域的应用却受到一定限制。在 Uppaal 模型检查器的帮助下,我们提出了一个对数字电路进行形式时序分析的通用框架。在 Uppaal 模型检查器中,使用定时自动机对给定的数字电路及其状态转换图形式的时序参数进行建模。根据相应的技术参数计算时序延迟,并使用 Quartus Prime Pro 获取电路路径信息。为了使分析具有可扩展性,我们还提出了一种新颖的路径分割技术,并将其结果与完整路径分析和传统的 STA 进行了比较。正式模型借助属性进行验证,以评估所考虑电路的时序特性,如时钟周期、临界路径和传播延迟。为说明起见,介绍了 ISCAS-85 和 ISCAS-89 基准电路的建模和验证。
{"title":"Formal timing analysis of gate-level digital circuits using model checking","authors":"Qurat-ul Ain,&nbsp;Osman Hasan","doi":"10.1016/j.micpro.2024.105083","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105083","url":null,"abstract":"<div><p>Due to the continuous reduction in the transistors sizing ruled by the Moore’s law, digital devices have become smaller, and more complex resulting in an enormous rise in the delay variations. Therefore, there is a dire need of precise and rigorous timing analysis to overcome anomalies during the timing analysis. Timings of digital circuits can be verified using various simulation or static timing analysis (STA) based tools but they provide estimated results due to their inherent in-exhaustive nature or report timing paths corresponding to non-existent functional paths, respectively. Formal verification provides complete and sound analysis results and has widely been used for the functional verification of digital circuits but its application in the timing analysis domain is somewhat limited. We present a generic framework to perform formal timing analysis of digital circuits with the help of Uppaal model-checker. The given digital circuit along with its timing parameters in the form of state transition diagram are modeled using timed automata in the Uppaal model checker. Timing delays are calculated from corresponding technology parameters, and Quartus Prime Pro is used to obtain the information about the circuits’ paths. In order to make the analysis scalable, we also propose a novel path partitioning technique and compare its results with complete path analysis and traditional STA. The formal model is verified with the help of properties to assess the timing characteristics, like time period of a clock, critical path, and propagation delay of the considered circuit. Modeling and verification of ISCAS-85 and ISCAS-89 benchmark circuits is presented for illustration purposes.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105083"},"PeriodicalIF":1.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design of a low-area hardware architecture to predict early signs of sudden cardiac arrests 设计用于预测心脏骤停早期征兆的低面积硬件架构
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-13 DOI: 10.1016/j.micpro.2024.105082
Anusaka Gon, Atin Mukherjee

Sudden cardiac arrest (SCA) results in an unexpected and untimely death within minutes, and its early prediction can alert cardiac patients to a timely medical diagnosis. To detect early symptoms of an SCA, the detection and classification of ventricular tachycardias (VT) are of utmost importance. In this work, a low-area yet highly accurate hardware architecture for VT classification is proposed based on the detection of premature ventricular contraction (PVC) beats. After pre-processing of the ECG signals using a wavelet-based pre-processing unit, a characteristics-matching algorithm is used to detect the PVC beats, and a low-complexity adaptive decision-based logic classifier is used to classify them into four types of VTs, namely monomorphic, polymorphic, non-sustained VT (NSVT), and sustained VT (SVT). FPGA verification of the hardware architecture for the VT classifier using the Nexys 4 DDR Artix-7 board utilizes 10.4 % of the total available resources and displays the type of VT and the number of PVCs detected to help in determining the severity of SCA and the need for medical attention. The ASIC implementation of the proposed PVC-based VT classification using the SCL 180 nm CMOS technology results in an area overhead of 0.02 mm2 and a power consumption of 3.47 μW for a high accuracy rate of 98.2 %. When compared to the existing CA detection systems for wearable devices, the proposed one consumes the least area while achieving high detection rates.

心脏骤停(SCA)会在数分钟内导致意外和过早死亡,而早期预测可以提醒心脏病患者及时就医。要发现 SCA 的早期症状,室性心动过速(VT)的检测和分类至关重要。在这项工作中,基于室性早搏(PVC)的检测,提出了一种用于室速分类的低面积、高精度硬件架构。在使用基于小波的预处理单元对心电图信号进行预处理后,使用特征匹配算法检测 PVC 搏动,并使用低复杂度自适应决策逻辑分类器将其分为四种类型的 VT,即单形、多形、非持续 VT(NSVT)和持续 VT(SVT)。使用 Nexys 4 DDR Artix-7 板对 VT 分类器的硬件架构进行了 FPGA 验证,利用了总可用资源的 10.4%,并显示了 VT 类型和检测到的 PVC 数量,以帮助确定 SCA 的严重程度和是否需要就医。采用 SCL 180 纳米 CMOS 技术的 ASIC 实现了基于 PVC 的 VT 分类,面积开销为 0.02 mm2,功耗为 3.47 μW,准确率高达 98.2%。与现有的可穿戴设备 CA 检测系统相比,所提出的系统在实现高检测率的同时,占用面积最小。
{"title":"Design of a low-area hardware architecture to predict early signs of sudden cardiac arrests","authors":"Anusaka Gon,&nbsp;Atin Mukherjee","doi":"10.1016/j.micpro.2024.105082","DOIUrl":"10.1016/j.micpro.2024.105082","url":null,"abstract":"<div><p>Sudden cardiac arrest (SCA) results in an unexpected and untimely death within minutes, and its early prediction can alert cardiac patients to a timely medical diagnosis. To detect early symptoms of an SCA, the detection and classification of ventricular tachycardias (VT) are of utmost importance. In this work, a low-area yet highly accurate hardware architecture for VT classification is proposed based on the detection of premature ventricular contraction (PVC) beats. After pre-processing of the ECG signals using a wavelet-based pre-processing unit, a characteristics-matching algorithm is used to detect the PVC beats, and a low-complexity adaptive decision-based logic classifier is used to classify them into four types of VTs, namely monomorphic, polymorphic, non-sustained VT (NSVT), and sustained VT (SVT). FPGA verification of the hardware architecture for the VT classifier using the Nexys 4 DDR Artix-7 board utilizes 10.4 % of the total available resources and displays the type of VT and the number of PVCs detected to help in determining the severity of SCA and the need for medical attention. The ASIC implementation of the proposed PVC-based VT classification using the SCL 180 nm CMOS technology results in an area overhead of 0.02 mm<sup>2</sup> and a power consumption of 3.47 μW for a high accuracy rate of 98.2 %. When compared to the existing CA detection systems for wearable devices, the proposed one consumes the least area while achieving high detection rates.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105082"},"PeriodicalIF":1.9,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141414119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An automated consistency management approach for a privacy-aware electric vehicle architecture 隐私感知电动汽车架构的自动一致性管理方法
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-06-06 DOI: 10.1016/j.micpro.2024.105074
Jonathan Stancke, Christian Plappert, Lukas Jäger

Modern vehicles contain a number of highly connected embedded systems that generate, store, and process information and exchange it with their environment. Since a large part of this information is privacy-critical, privacy laws such as the GDPR of the European Union apply to it. In this work, we evaluate the privacy-criticality of exemplary data and data flows of the electric driving domain on a reference architecture. We categorize the ECUs of the architecture based on the criticality of the data they process and propose measures and technologies as building blocks that provide adequate privacy protection according to the requirements given by the GDPR.

To ensure that all requirements are met by the reference architecture, we propose a more principled solution that simplifies the mapping between an architecture and the measures. For this purpose, we propose an architecture description template in JSON and an algorithm for automated consistency checks that outputs the measures and the security extension needed per Electronic Control Unit (ECU) to comply with derived privacy requirements.

现代汽车包含大量高度互联的嵌入式系统,这些系统可生成、存储和处理信息,并与周围环境交换信息。由于这些信息中有很大一部分对隐私至关重要,因此欧盟的 GDPR 等隐私法也适用于这些信息。在这项工作中,我们在一个参考架构上评估了电动驾驶领域的示例数据和数据流的隐私关键性。为了确保参考架构满足所有要求,我们提出了一个更加原则性的解决方案,以简化架构和措施之间的映射。为此,我们提出了一个 JSON 格式的架构描述模板和一个用于自动一致性检查的算法,该算法可输出每个电子控制单元(ECU)所需的措施和安全扩展,以符合衍生的隐私要求。
{"title":"An automated consistency management approach for a privacy-aware electric vehicle architecture","authors":"Jonathan Stancke,&nbsp;Christian Plappert,&nbsp;Lukas Jäger","doi":"10.1016/j.micpro.2024.105074","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105074","url":null,"abstract":"<div><p>Modern vehicles contain a number of highly connected embedded systems that generate, store, and process information and exchange it with their environment. Since a large part of this information is privacy-critical, privacy laws such as the GDPR of the European Union apply to it. In this work, we evaluate the privacy-criticality of exemplary data and data flows of the electric driving domain on a reference architecture. We categorize the ECUs of the architecture based on the criticality of the data they process and propose measures and technologies as building blocks that provide adequate privacy protection according to the requirements given by the GDPR.</p><p>To ensure that all requirements are met by the reference architecture, we propose a more principled solution that simplifies the mapping between an architecture and the measures. For this purpose, we propose an architecture description template in JSON and an algorithm for automated consistency checks that outputs the measures and the security extension needed per Electronic Control Unit (ECU) to comply with derived privacy requirements.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105074"},"PeriodicalIF":2.6,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000693/pdfft?md5=e4034fe6211d68785c24aa81ea2401f7&pid=1-s2.0-S0141933124000693-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141333389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving performance of simultaneous multithreading CPUs using autonomous control of speculative traces 利用投机跟踪的自主控制提高同步多线程 CPU 的性能
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-26 DOI: 10.1016/j.micpro.2024.105073
Ryan F. Ortiz, Wei-Ming Lin

Simultaneous Multithreading (SMT) allows for a processor to concurrently execute multiple independent threads while sharing certain data path components to optimize resource waste. Speculative execution allows for these processors to take advantage of Instruction-Level Parallelism but the penalty for a miss speculation includes the wasting of resources amongst these shared resources where clock cycles are wasted at a time. In this paper we show that an average of 13 % of instructions are flushed as a result of incorrect predictions. These flushed out instructions could have potentially taken up shared resources which other non-speculative threads could have used. This paper proposes a technique that can dynamically adjust how many speculative instructions a thread can rename and decode aiming to diminish the waste of the shared resources. Our simulation results show, with the proposed technique, that the average flushed out instruction rate is reduced by 23 % and average throughput is improved by 13 %.

同时多线程(SMT)允许处理器同时执行多个独立线程,同时共享某些数据路径组件,以优化资源浪费。投机执行允许这些处理器利用指令级并行性,但投机失误的惩罚包括在这些共享资源中浪费资源,每次都会浪费时钟周期。本文显示,由于预测错误,平均有 13% 的指令被刷新。这些被刷新的指令可能占用了其他非推测线程本可以使用的共享资源。本文提出了一种可动态调整线程重命名和解码投机指令数量的技术,旨在减少对共享资源的浪费。我们的仿真结果表明,采用该技术后,平均刷新指令率降低了 23%,平均吞吐量提高了 13%。
{"title":"Improving performance of simultaneous multithreading CPUs using autonomous control of speculative traces","authors":"Ryan F. Ortiz,&nbsp;Wei-Ming Lin","doi":"10.1016/j.micpro.2024.105073","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105073","url":null,"abstract":"<div><p>Simultaneous Multithreading (SMT) allows for a processor to concurrently execute multiple independent threads while sharing certain data path components to optimize resource waste. Speculative execution allows for these processors to take advantage of Instruction-Level Parallelism but the penalty for a miss speculation includes the wasting of resources amongst these shared resources where clock cycles are wasted at a time. In this paper we show that an average of 13 % of instructions are flushed as a result of incorrect predictions. These flushed out instructions could have potentially taken up shared resources which other non-speculative threads could have used. This paper proposes a technique that can dynamically adjust how many speculative instructions a thread can rename and decode aiming to diminish the waste of the shared resources. Our simulation results show, with the proposed technique, that the average flushed out instruction rate is reduced by 23 % and average throughput is improved by 13 %.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105073"},"PeriodicalIF":2.6,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141242952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancements on IoT and AI applied to Pneumology 物联网和人工智能在肺科领域的应用进展
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-18 DOI: 10.1016/j.micpro.2024.105062
Enrico Cambiaso , Sara Narteni , Ilaria Baiardini , Fulvio Braido , Alessia Paglialonga , Maurizio Mongelli

The objective of this work is the design of a technological platform for remote monitoring of patients with Chronic Obstructive Pulmonary Disease (COPD). The concept of the framework is a breakthrough in the state of medical, scientific and technological art, aimed at engaging patients in the treatment plan and supporting interaction with healthcare professionals. The proposed platform is able to support a new paradigm for the management of patients with COPD, by integrating clinical data and parameters monitored in daily life using Artificial Intelligence algorithms. Therefore, the doctor is provided with a dynamic picture of the disease and its impact on lifestyle and vice versa, and can thus plan more personalized diagnostics, therapeutics, and social interventions. This strategy allows for a more effective organization of access to outpatient care and therefore a reduction of emergencies and hospitalizations because exacerbations of the disease can be better prevented and monitored. Hence, it can result in improvements in patients’ quality of life and lower costs for the healthcare system.

这项工作的目的是设计一个远程监控慢性阻塞性肺病(COPD)患者的技术平台。该框架的概念是医学、科学和技术领域的一个突破,旨在让患者参与治疗计划,并支持与医护人员的互动。通过使用人工智能算法整合日常生活中监测到的临床数据和参数,拟议的平台能够支持慢性阻塞性肺病患者管理的新模式。因此,医生可以获得疾病的动态图像及其对生活方式的影响,反之亦然,从而可以制定更加个性化的诊断、治疗和社会干预计划。这种策略可以更有效地组织门诊治疗,从而减少急诊和住院治疗,因为可以更好地预防和监测疾病的恶化。因此,它可以改善患者的生活质量,降低医疗系统的成本。
{"title":"Advancements on IoT and AI applied to Pneumology","authors":"Enrico Cambiaso ,&nbsp;Sara Narteni ,&nbsp;Ilaria Baiardini ,&nbsp;Fulvio Braido ,&nbsp;Alessia Paglialonga ,&nbsp;Maurizio Mongelli","doi":"10.1016/j.micpro.2024.105062","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105062","url":null,"abstract":"<div><p>The objective of this work is the design of a technological platform for remote monitoring of patients with Chronic Obstructive Pulmonary Disease (COPD). The concept of the framework is a breakthrough in the state of medical, scientific and technological art, aimed at engaging patients in the treatment plan and supporting interaction with healthcare professionals. The proposed platform is able to support a new paradigm for the management of patients with COPD, by integrating clinical data and parameters monitored in daily life using Artificial Intelligence algorithms. Therefore, the doctor is provided with a dynamic picture of the disease and its impact on lifestyle and vice versa, and can thus plan more personalized diagnostics, therapeutics, and social interventions. This strategy allows for a more effective organization of access to outpatient care and therefore a reduction of emergencies and hospitalizations because exacerbations of the disease can be better prevented and monitored. Hence, it can result in improvements in patients’ quality of life and lower costs for the healthcare system.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105062"},"PeriodicalIF":2.6,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000577/pdfft?md5=04b32d737cc9dd247636adf8505b415a&pid=1-s2.0-S0141933124000577-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A two stage pipeline architecture for hardware implementation of multi-level decomposition of 1-D framelet transform 用于硬件实现一维小帧变换多级分解的两级流水线架构
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-18 DOI: 10.1016/j.micpro.2024.105064
Kasetty Praveen Kumar, Aniruddha Kanhe

In this paper a two stage pipeline architecture for computation of multilevel decomposition of framelet transform is proposed. To handle the problem of perfect reconstruction, an area efficient symmetric extension router is used that duplicates the appropriate number of data samples of input signal at the boundary followed by reflection about the symmetry axis. In addition, to reduce the period and number of clock cycles required for computing the framelet transform, the inter-stage and intrastage pipeline of the computational units is maximized. The inter-stage pipelining is obtained by distributing the various levels of decomposition among the computational units of two stages, and a synchronization mechanism is adopted to reduce the total number of clock cycles. Similarly, the intrastage pipelining is achieved by using the pipeline registers such that the clock period is limited to the delay of multiplier and accumulator (MAC) circuit of the finite-impulse response (FIR) filter. To validate the feasibility and functionality of the proposed hardware architecture, the design is implemented on Artix7 XC7A100TCSG324-1 field-programmable gate array (FPGA) for the case of framelet transform with one low-pass and two high-pass filters. The proposed architecture is able to operate at a maximum clock frequency of 112 MHz.

本文提出了一种用于计算小帧变换多级分解的两级流水线架构。为处理完美重构问题,采用了一种面积高效的对称扩展路由器,在边界处复制适当数量的输入信号数据样本,然后绕对称轴进行反射。此外,为了减少计算小帧变换所需的周期和时钟周期数,计算单元的级间和级内流水线被最大限度地利用。级间流水线是通过将各级分解分配给两级计算单元来实现的,并采用同步机制来减少时钟周期总数。同样,级内流水线化是通过使用流水线寄存器实现的,这样时钟周期就被限制在有限脉冲响应(FIR)滤波器的乘法器和累加器(MAC)电路的延迟范围内。为了验证所提硬件架构的可行性和功能性,设计在 Artix7 XC7A100TCSG324-1 现场可编程门阵列(FPGA)上实现,用于带有一个低通和两个高通滤波器的小帧变换。所提出的架构能够在最高 112 MHz 的时钟频率下运行。
{"title":"A two stage pipeline architecture for hardware implementation of multi-level decomposition of 1-D framelet transform","authors":"Kasetty Praveen Kumar,&nbsp;Aniruddha Kanhe","doi":"10.1016/j.micpro.2024.105064","DOIUrl":"10.1016/j.micpro.2024.105064","url":null,"abstract":"<div><p>In this paper a two stage pipeline architecture for computation of multilevel decomposition of framelet transform is proposed. To handle the problem of perfect reconstruction, an area efficient symmetric extension router is used that duplicates the appropriate number of data samples of input signal at the boundary followed by reflection about the symmetry axis. In addition, to reduce the period and number of clock cycles required for computing the framelet transform, the inter-stage and intrastage pipeline of the computational units is maximized. The inter-stage pipelining is obtained by distributing the various levels of decomposition among the computational units of two stages, and a synchronization mechanism is adopted to reduce the total number of clock cycles. Similarly, the intrastage pipelining is achieved by using the pipeline registers such that the clock period is limited to the delay of multiplier and accumulator (MAC) circuit of the finite-impulse response (FIR) filter. To validate the feasibility and functionality of the proposed hardware architecture, the design is implemented on Artix7 XC7A100TCSG324-1 field-programmable gate array (FPGA) for the case of framelet transform with one low-pass and two high-pass filters. The proposed architecture is able to operate at a maximum clock frequency of 112 MHz.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105064"},"PeriodicalIF":2.6,"publicationDate":"2024-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141138587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPGA-based stereo matching for crop height measurement using monocular camera 基于 FPGA 的立体匹配技术,利用单目摄像头测量作物高度
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-14 DOI: 10.1016/j.micpro.2024.105063
Iman Firmansyah , Yoshiki Yamaguchi , Tsutomu Maruyama , Yuta Matsuura , Zhang Heming , Shin Kawai , Hajime Nobuhara

We have proposed a hardware-accelerated drone to analyze the condition of farmland right then and there; as a first step, we report that the proposed system can take crop height measurements with high accuracy using a monocular camera. The proposed three-dimensional farmland is generated using stereo matching, where a drone with a monocular camera can extend the parallax distance as the length between two positions when taking a ground image. This means that our approach can improve the accuracy of a reconstructed 3D farmland. In addition, toward real-time computation and low power consumption, the proposed hardware design accelerates image processing efficiently. Thus, to achieve this, we propose a strategy that combines the semi-global matching (SGM) with single path direction and a sum of absolute difference (SAD) with reduced disparity searching length. For example, a semi-global matching (SGM) was employed to smooth the disparity map result before checking the consistency, where the scan line was performed in one direction, from left to right, to speed up the computation time. The experimental result shows that the computation time performed by Xilinx Zynq ZCU102 FPGA achieves 0.77 s for the stereo data set images with 1536 × 1024 pixels resolution. To meet the real-time application and reduce the FPGA resources toward lower power consumption, the experiment discusses reducing the disparity searching length for the SAD computation. In our experiment, the execution time is less than 40 milliseconds, and the circuit volume is around 9,500 LUTs, equivalent to a small-size FPGA. Finally, we also estimated the object's height; a value of 0.43 m was estimated for the object with a physical height of 0.45 m. Meanwhile, for the object with a physical height of 0.65 m, a value of 0.63 m was estimated.

我们提出了一种硬件加速无人机,可在现场分析农田状况;作为第一步,我们报告了所提出的系统可使用单目相机高精度测量作物高度。所提议的三维农田是利用立体匹配生成的,在拍摄地面图像时,带有单目摄像头的无人机可将视差距离扩展为两个位置之间的长度。这意味着我们的方法可以提高重建三维农田的精度。此外,为了实现实时计算和低功耗,我们提出的硬件设计可有效加速图像处理。因此,为了实现这一目标,我们提出了一种策略,即结合单路径方向的半全局匹配(SGM)和减少差异搜索长度的绝对差值总和(SAD)。例如,在检查一致性之前,采用半全局匹配(SGM)来平滑差异图结果,扫描线从左到右单向进行,以加快计算时间。实验结果表明,对于分辨率为 1536 × 1024 像素的立体数据集图像,Xilinx Zynq ZCU102 FPGA 的计算时间为 0.77 秒。为了满足实时应用并减少 FPGA 资源以降低功耗,本实验讨论了缩短 SAD 计算的差距搜索长度。在我们的实验中,执行时间小于 40 毫秒,电路容量约为 9,500 LUT,相当于一个小型 FPGA。最后,我们还估算了物体的高度;对于物理高度为 0.45 米的物体,估算值为 0.43 米;而对于物理高度为 0.65 米的物体,估算值为 0.63 米。
{"title":"FPGA-based stereo matching for crop height measurement using monocular camera","authors":"Iman Firmansyah ,&nbsp;Yoshiki Yamaguchi ,&nbsp;Tsutomu Maruyama ,&nbsp;Yuta Matsuura ,&nbsp;Zhang Heming ,&nbsp;Shin Kawai ,&nbsp;Hajime Nobuhara","doi":"10.1016/j.micpro.2024.105063","DOIUrl":"10.1016/j.micpro.2024.105063","url":null,"abstract":"<div><p>We have proposed a hardware-accelerated drone to analyze the condition of farmland right then and there; as a first step, we report that the proposed system can take crop height measurements with high accuracy using a monocular camera. The proposed three-dimensional farmland is generated using stereo matching, where a drone with a monocular camera can extend the parallax distance as the length between two positions when taking a ground image. This means that our approach can improve the accuracy of a reconstructed 3D farmland. In addition, toward real-time computation and low power consumption, the proposed hardware design accelerates image processing efficiently. Thus, to achieve this, we propose a strategy that combines the semi-global matching (SGM) with single path direction and a sum of absolute difference (SAD) with reduced disparity searching length. For example, a semi-global matching (SGM) was employed to smooth the disparity map result before checking the consistency, where the scan line was performed in one direction, from left to right, to speed up the computation time. The experimental result shows that the computation time performed by Xilinx Zynq ZCU102 FPGA achieves 0.77 s for the stereo data set images with 1536 × 1024 pixels resolution. To meet the real-time application and reduce the FPGA resources toward lower power consumption, the experiment discusses reducing the disparity searching length for the SAD computation. In our experiment, the execution time is less than 40 milliseconds, and the circuit volume is around 9,500 LUTs, equivalent to a small-size FPGA. Finally, we also estimated the object's height; a value of 0.43 m was estimated for the object with a physical height of 0.45 m. Meanwhile, for the object with a physical height of 0.65 m, a value of 0.63 m was estimated.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105063"},"PeriodicalIF":2.6,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141053941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OpSAVE: Eviction Based Scheme for Efficient Optical Network-on-Chip OpSAVE:基于驱逐的高效片上光网络方案
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-05-11 DOI: 10.1016/j.micpro.2024.105061
Uzmat Ul Nisa, Janibul Bashir

For on-chip networks, nanophotonics has been considered a strong alternative owing to its high speed (due to low latency) and high bandwidth (due to wavelength division multiplexing). However, the major hurdle in the adoption of nanophotonic-based on-chip networks is their high static power consumption. Various proposals are there in the literature which try to reduce the static power consumption either by modulating the laser or by allowing the on-chip stations to share the photonic channels. In this paper, we propose OpSAVE— an optical NoC that combines the above two strategies to effectively reduce static power consumption. It proposes a superior prediction mechanism based on the eviction details from the private caches. It explains how shared channels can be used to dynamically balance the load and at the same time handle mispredictions. It allows the optical stations to share both the power and the available bandwidth to increase their utilization. Moreover, OpSAVE proposes to use a double pumping strategy to improve the system performance. We compared our scheme with the state-of-the-art proposals in this domain and the results show that our scheme consumes 4.4X less optical power and at the same time improves the performance by nearly 28%. In the evaluation, we have considered the multicore benchmarks from the Splash and Parsec benchmark suites.

对于片上网络而言,纳米光子技术因其高速度(由于低延迟)和高带宽(由于波分复用)而被认为是一种强有力的替代技术。然而,采用基于纳米光子的片上网络的主要障碍是其高静态功耗。文献中有各种建议,试图通过调制激光或允许片上站共享光子通道来降低静态功耗。在本文中,我们提出了 OpSAVE--一种光 NoC,它结合了上述两种策略,可有效降低静态功耗。它提出了一种基于私有缓存驱逐细节的卓越预测机制。它解释了如何利用共享通道来动态平衡负载,同时处理错误预测。它允许光站共享功率和可用带宽,以提高其利用率。此外,OpSAVE 还建议使用双抽水策略来提高系统性能。我们将我们的方案与该领域最先进的方案进行了比较,结果表明,我们的方案消耗的光功率减少了 4.4 倍,同时性能提高了近 28%。在评估中,我们考虑了 Splash 和 Parsec 基准套件中的多核基准。
{"title":"OpSAVE: Eviction Based Scheme for Efficient Optical Network-on-Chip","authors":"Uzmat Ul Nisa,&nbsp;Janibul Bashir","doi":"10.1016/j.micpro.2024.105061","DOIUrl":"10.1016/j.micpro.2024.105061","url":null,"abstract":"<div><p>For on-chip networks, nanophotonics has been considered a strong alternative owing to its high speed (due to low latency) and high bandwidth (due to wavelength division multiplexing). However, the major hurdle in the adoption of nanophotonic-based on-chip networks is their high static power consumption. Various proposals are there in the literature which try to reduce the static power consumption either by modulating the laser or by allowing the on-chip stations to share the photonic channels. In this paper, we propose <em>OpSAVE</em>— an optical NoC that combines the above two strategies to effectively reduce static power consumption. It proposes a superior prediction mechanism based on the eviction details from the private caches. It explains how shared channels can be used to dynamically balance the load and at the same time handle mispredictions. It allows the optical stations to share both the power and the available bandwidth to increase their utilization. Moreover, <em>OpSAVE</em> proposes to use a double pumping strategy to improve the system performance. We compared our scheme with the state-of-the-art proposals in this domain and the results show that our scheme consumes 4.4X less optical power and at the same time improves the performance by nearly 28%. In the evaluation, we have considered the multicore benchmarks from the Splash and Parsec benchmark suites.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"108 ","pages":"Article 105061"},"PeriodicalIF":2.6,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141052214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low latency FPGA implementation of NTT for Kyber Kyber NTT 的低延迟 FPGA 实现
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-20 DOI: 10.1016/j.micpro.2024.105059
Mohamed Saoudi, Akram Kermiche, Omar Hocine Benhaddad, Nadir Guetmi, Boufeldja Allailou

This paper presents an FPGA implementation of Number Theoretic Transform (NTT) for the Kyber Post-Quantum Cryptographic (PQC) standard. NTT is the slowest process within Kyber thus a large number of efforts has been conducted to enhance its computational efficiency. Leveraging parallelism and dedicated multipliers, our design achieves state-of-the-art latency, performing NTT/INTT in just 0.4/0.5μs, surpassing existing designs by at least 3.75/3 times. The proposed design is implemented on the cost-effective Artix-7 FPGA.

本文介绍了 Kyber 后量子加密(PQC)标准的数论变换(NTT)的 FPGA 实现。NTT 是 Kyber 中速度最慢的处理过程,因此人们为提高其计算效率做了大量工作。利用并行性和专用乘法器,我们的设计实现了最先进的延迟,执行 NTT/INTT 仅需 0.4/0.5μs 的时间,比现有设计至少超出 3.75/3 倍。所提出的设计是在高性价比的 Artix-7 FPGA 上实现的。
{"title":"Low latency FPGA implementation of NTT for Kyber","authors":"Mohamed Saoudi,&nbsp;Akram Kermiche,&nbsp;Omar Hocine Benhaddad,&nbsp;Nadir Guetmi,&nbsp;Boufeldja Allailou","doi":"10.1016/j.micpro.2024.105059","DOIUrl":"10.1016/j.micpro.2024.105059","url":null,"abstract":"<div><p>This paper presents an FPGA implementation of Number Theoretic Transform (NTT) for the Kyber Post-Quantum Cryptographic (PQC) standard. NTT is the slowest process within Kyber thus a large number of efforts has been conducted to enhance its computational efficiency. Leveraging parallelism and dedicated multipliers, our design achieves state-of-the-art latency, performing NTT/INTT in just 0.4/<span><math><mrow><mn>0</mn><mo>.</mo><mn>5</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span>, surpassing existing designs by at least 3.75/3 times. The proposed design is implemented on the cost-effective Artix-7 FPGA.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"107 ","pages":"Article 105059"},"PeriodicalIF":2.6,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140792938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ExTern: Boosting RISC-V core performance using ternary encoding ExTern:利用三元编码提升 RISC-V 内核性能
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-04-15 DOI: 10.1016/j.micpro.2024.105058
Farhad EbrahimiAzandaryani, Dietmar Fey

This paper presents an effective μ-architectural design method, called ExTern, to enhance the performance of a RISC-V processor experiencing computation bottlenecks. ExTern involves integrating Canonical Signed Digit (CSD) representation, a ternary number system enabling carry/borrow-free addition/subtraction in constant time O(1), into the RISC-V processor, particularly into the execution stage. Furthermore, it adopts an extended six-stage pipeline architecture to maximize employed encoding benefits, leading to more improvement in overall execution time and throughput. Despite the presence of optimized circuits, such as fast carry chain (CARRY4) modules for binary encoding on FPGA, the customized processor applying ExTern, RISC-VT, showcases remarkable improvement in computation performance. Experimental results demonstrate a 34.3% (12.2%) improvement in working frequency leading to a lower 31% (11.5%) execution time and a 32% (12%) increase in throughput compared to a State-of-the-Art open-source five(six)-stage RISC-V processor.

本文提出了一种有效的 μ 架构设计方法(称为 ExTern),用于提高遭遇计算瓶颈的 RISC-V 处理器的性能。ExTern 涉及将 Canonical Signed Digit (CSD) 表示法(一种三元数系统,可在 O(1) 恒定时间内实现无携带/借用加法/减法)集成到 RISC-V 处理器中,特别是集成到执行阶段。此外,它还采用了扩展的六级流水线架构,以最大限度地发挥采用编码的优势,从而进一步改善整体执行时间和吞吐量。尽管在 FPGA 上使用了用于二进制编码的快速携带链(CARRY4)模块等优化电路,但应用 ExTern 的定制处理器 RISC-VT 在计算性能方面仍有显著提高。实验结果表明,与最先进的开源五(六)级 RISC-V 处理器相比,工作频率提高了 34.3%(12.2%),执行时间缩短了 31%(11.5%),吞吐量增加了 32%(12%)。
{"title":"ExTern: Boosting RISC-V core performance using ternary encoding","authors":"Farhad EbrahimiAzandaryani,&nbsp;Dietmar Fey","doi":"10.1016/j.micpro.2024.105058","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105058","url":null,"abstract":"<div><p>This paper presents an effective <span><math><mi>μ</mi></math></span>-architectural design method, called ExTern, to enhance the performance of a RISC-V processor experiencing computation bottlenecks. ExTern involves integrating Canonical Signed Digit (CSD) representation, a ternary number system enabling carry/borrow-free addition/subtraction in constant time <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mn>1</mn><mo>)</mo></mrow></mrow></math></span>, into the RISC-V processor, particularly into the execution stage. Furthermore, it adopts an extended six-stage pipeline architecture to maximize employed encoding benefits, leading to more improvement in overall execution time and throughput. Despite the presence of optimized circuits, such as fast carry chain (CARRY4) modules for binary encoding on FPGA, the customized processor applying ExTern, RISC-VT, showcases remarkable improvement in computation performance. Experimental results demonstrate a 34.3% (12.2%) improvement in working frequency leading to a lower 31% (11.5%) execution time and a 32% (12%) increase in throughput compared to a State-of-the-Art open-source five(six)-stage RISC-V processor.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"107 ","pages":"Article 105058"},"PeriodicalIF":2.6,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S014193312400053X/pdfft?md5=5219c364add625230da3e174054a963d&pid=1-s2.0-S014193312400053X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140620839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Microprocessors and Microsystems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1