Microprocessors and Microsystems最新文献_第3页

Recent advances in Machine Learning based Advanced Driver Assistance System applications 基于机器学习的高级驾驶辅助系统应用的最新进展

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems

Pub Date : 2024-09-12 DOI: 10.1016/j.micpro.2024.105101

Guner Tatar , Salih Bayar , Ihsan Cicek , Smail Niar

In recent years, the rise of traffic in modern cities has demanded novel technology to support the drivers and protect the passengers and other third parties involved in transportation. Thanks to rapid technological progress and innovations, many Advanced Driver Assistance Systems (A/DAS) based on Machine Learning (ML) algorithms have emerged to address the increasing demand for practical A/DAS applications. Fast and accurate execution of A/DAS algorithms is essential for preventing loss of life and property. High-speed hardware accelerators are vital for processing the high volume of data captured by increasingly sophisticated sensors and complex mathematical models’ execution of modern deep learning (DL) algorithms. One of the fundamental challenges in this new era is to design energy-efficient and portable ML-enabled platforms for vehicles to provide driver assistance and safety. This article presents recent progress in ML-driven A/DAS technology to offer new insights for researchers. We covered standard ML models and optimization approaches based on widely accepted open-source frameworks extensively used in A/DAS applications. We have also highlighted related articles on ML and its sub-branches, neural networks (NNs), and DL. We have also reported the implementation issues, bench-marking problems, and potential challenges for future research. Popular embedded hardware platforms such as Field Programmable Gate Arrays (FPGAs), central processing units (CPUs), Graphical Processing Units (GPUs), and Application Specific Integrated Circuits (ASICs) used to implement A/DAS applications are also compared concerning their performance and resource utilization. We have examined the hardware and software development environments used in implementing A/DAS applications and reported their advantages and disadvantages. We provided performance comparisons of usual A/DAS tasks such as traffic sign recognition, road and lane detection, vehicle and pedestrian detection, driver behavior, and multiple tasking. Considering the current research dynamics, A/DAS will remain one of the most popular application fields for vehicular transportation shortly.

近年来，现代城市的交通流量不断增加，这就需要新技术来支持驾驶员，保护乘客和其他参与交通的第三方。得益于快速的技术进步和创新，许多基于机器学习（ML）算法的高级驾驶辅助系统（A/DAS）应运而生，以满足对 A/DAS 实际应用日益增长的需求。快速准确地执行 A/DAS 算法对于防止生命和财产损失至关重要。高速硬件加速器对于处理日益精密的传感器捕获的大量数据和执行现代深度学习（DL）算法的复杂数学模型至关重要。新时代的基本挑战之一是为车辆设计高能效、便携式的人工智能平台，以提供驾驶辅助和安全。本文介绍了 ML 驱动的 A/DAS 技术的最新进展，为研究人员提供了新的见解。我们介绍了标准 ML 模型和优化方法，它们基于广泛应用于 A/DAS 应用的开源框架。我们还重点介绍了有关 ML 及其分支、神经网络 (NN) 和 DL 的相关文章。我们还报告了实施问题、基准问题和未来研究的潜在挑战。我们还比较了用于实现 A/DAS 应用程序的常用嵌入式硬件平台，如现场可编程门阵列 (FPGA)、中央处理器 (CPU)、图形处理器 (GPU) 和专用集成电路 (ASIC)，了解它们的性能和资源利用情况。我们研究了用于实施 A/DAS 应用程序的硬件和软件开发环境，并报告了它们的优缺点。我们提供了常见 A/DAS 任务的性能比较，如交通标志识别、道路和车道检测、车辆和行人检测、驾驶员行为和多重任务。考虑到当前的研究动态，A/DAS 在短期内仍将是车辆交通领域最热门的应用领域之一。

{"title":"Recent advances in Machine Learning based Advanced Driver Assistance System applications","authors":"Guner Tatar , Salih Bayar , Ihsan Cicek , Smail Niar","doi":"10.1016/j.micpro.2024.105101","DOIUrl":"10.1016/j.micpro.2024.105101","url":null,"abstract":"<div><p>In recent years, the rise of traffic in modern cities has demanded novel technology to support the drivers and protect the passengers and other third parties involved in transportation. Thanks to rapid technological progress and innovations, many Advanced Driver Assistance Systems (A/DAS) based on Machine Learning (ML) algorithms have emerged to address the increasing demand for practical A/DAS applications. Fast and accurate execution of A/DAS algorithms is essential for preventing loss of life and property. High-speed hardware accelerators are vital for processing the high volume of data captured by increasingly sophisticated sensors and complex mathematical models’ execution of modern deep learning (DL) algorithms. One of the fundamental challenges in this new era is to design energy-efficient and portable ML-enabled platforms for vehicles to provide driver assistance and safety. This article presents recent progress in ML-driven A/DAS technology to offer new insights for researchers. We covered standard ML models and optimization approaches based on widely accepted open-source frameworks extensively used in A/DAS applications. We have also highlighted related articles on ML and its sub-branches, neural networks (NNs), and DL. We have also reported the implementation issues, bench-marking problems, and potential challenges for future research. Popular embedded hardware platforms such as Field Programmable Gate Arrays (FPGAs), central processing units (CPUs), Graphical Processing Units (GPUs), and Application Specific Integrated Circuits (ASICs) used to implement A/DAS applications are also compared concerning their performance and resource utilization. We have examined the hardware and software development environments used in implementing A/DAS applications and reported their advantages and disadvantages. We provided performance comparisons of usual A/DAS tasks such as traffic sign recognition, road and lane detection, vehicle and pedestrian detection, driver behavior, and multiple tasking. Considering the current research dynamics, A/DAS will remain one of the most popular application fields for vehicular transportation shortly.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105101"},"PeriodicalIF":1.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Proactive deadlock prevention based on traffic classification sub-graphs for triplet-based NoC TriBA-cNoC 基于流量分类子图的主动死锁预防，适用于基于三胞胎的 NoC TriBA-cNoC

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems

Pub Date : 2024-08-31 DOI: 10.1016/j.micpro.2024.105091

Karim Soliman, Shi Feng, Ruan Shengqiang, Chunfeng Li

Network topology and routing algorithms stand as pivotal decision points that profoundly impact the performance of Network-on-Chip (NoC) systems. As core counts rise, so does the inherent competition for shared resources, spotlighting the critical need for meticulously designed routing algorithms that circumvent deadlocks to ensure optimal network efficiency. This research capitalizes on the Triplet-Base Architecture (TriBA) and its Distributed Minimal Routing Algorithm (DM4T) to overcome the limitations of previous approaches. While DM4T exhibits performance advantages over previous routing algorithms, its deterministic nature and potential for circular dependencies during routing can lead to deadlocks and congestion. Therefore, this work addresses these vulnerabilities while leveraging the performance benefits of TriBA and DM4T. This work introduces a novel approach that merges a proactive deadlock prevention mechanism with Intermediate Adjacent Shortest Path Routing (IASPR). This combination guarantees both deadlock-free and livelock-free routing, ensuring reliable communication within the network. The key to this integration lies in a flow model-based data transfer categorization technique. This technique prevents the formation of circular dependencies. Additionally, it reduces redundant distance calculations during the routing process. By addressing these challenges, the proposed approach achieves improvements in both routing latency and throughput. To rigorously assess the performance of TriBA network topologies under varying configurations, extensive simulations were undertaken. The investigation encompassed both TriBA networks comprising 9 nodes and those with 27 nodes, employing DM4T, IASPR routing algorithms, and the proactive deadlock prevention method. The gem5 simulator, operating under the Garnet 3.0 network model using a standalone protocol for synthetic traffic patterns, was utilized for simulations at high injection rates, spanning diverse synthetic traffic patterns and PARSEC benchmark suite applications. Simulations rigorously quantified the effectiveness of the proposed approach, revealing reductions in average latency 40.17% and 34.05% compared to the lookup table and DM4T, respectively. Additionally, there were notable increases in average throughput of 7.48% and 5.66%.

网络拓扑和路由算法是深刻影响片上网络 (NoC) 系统性能的关键决策点。随着内核数量的增加，对共享资源的固有竞争也在加剧，因此迫切需要精心设计的路由算法来规避死锁，以确保最佳的网络效率。本研究利用三重基础架构（TriBA）及其分布式最小路由算法（DM4T）克服了以往方法的局限性。虽然 DM4T 与之前的路由算法相比具有性能优势，但其确定性和路由过程中的潜在循环依赖性可能会导致死锁和拥塞。因此，本研究在利用 TriBA 和 DM4T 性能优势的同时，解决了这些漏洞。这项工作引入了一种新方法，将主动死锁预防机制与中间相邻最短路径路由（IASPR）相结合。这种组合保证了无死锁和无活锁路由，确保了网络内的可靠通信。这种整合的关键在于基于流模型的数据传输分类技术。这种技术可防止形成循环依赖关系。此外，它还能减少路由过程中多余的距离计算。通过应对这些挑战，所提出的方法实现了路由延迟和吞吐量的改善。为了严格评估 TriBA 网络拓扑在不同配置下的性能，我们进行了大量模拟。调查涵盖了由 9 个节点组成的 TriBA 网络和由 27 个节点组成的 TriBA 网络，采用了 DM4T、IASPR 路由算法和主动死锁预防方法。gem5 模拟器在 Garnet 3.0 网络模型下运行，使用合成流量模式的独立协议，以高注入率进行模拟，涵盖各种合成流量模式和 PARSEC 基准套件应用。模拟严格量化了建议方法的有效性，结果显示，与查找表和 DM4T 相比，平均延迟分别降低了 40.17% 和 34.05%。此外，平均吞吐量也显著提高了 7.48% 和 5.66%。

{"title":"Proactive deadlock prevention based on traffic classification sub-graphs for triplet-based NoC TriBA-cNoC","authors":"Karim Soliman, Shi Feng, Ruan Shengqiang, Chunfeng Li","doi":"10.1016/j.micpro.2024.105091","DOIUrl":"10.1016/j.micpro.2024.105091","url":null,"abstract":"<div><p>Network topology and routing algorithms stand as pivotal decision points that profoundly impact the performance of Network-on-Chip (NoC) systems. As core counts rise, so does the inherent competition for shared resources, spotlighting the critical need for meticulously designed routing algorithms that circumvent deadlocks to ensure optimal network efficiency. This research capitalizes on the Triplet-Base Architecture (TriBA) and its Distributed Minimal Routing Algorithm (DM4T) to overcome the limitations of previous approaches. While DM4T exhibits performance advantages over previous routing algorithms, its deterministic nature and potential for circular dependencies during routing can lead to deadlocks and congestion. Therefore, this work addresses these vulnerabilities while leveraging the performance benefits of TriBA and DM4T. This work introduces a novel approach that merges a proactive deadlock prevention mechanism with Intermediate Adjacent Shortest Path Routing (IASPR). This combination guarantees both deadlock-free and livelock-free routing, ensuring reliable communication within the network. The key to this integration lies in a flow model-based data transfer categorization technique. This technique prevents the formation of circular dependencies. Additionally, it reduces redundant distance calculations during the routing process. By addressing these challenges, the proposed approach achieves improvements in both routing latency and throughput. To rigorously assess the performance of TriBA network topologies under varying configurations, extensive simulations were undertaken. The investigation encompassed both TriBA networks comprising 9 nodes and those with 27 nodes, employing DM4T, IASPR routing algorithms, and the proactive deadlock prevention method. The gem5 simulator, operating under the Garnet 3.0 network model using a standalone protocol for synthetic traffic patterns, was utilized for simulations at high injection rates, spanning diverse synthetic traffic patterns and PARSEC benchmark suite applications. Simulations rigorously quantified the effectiveness of the proposed approach, revealing reductions in average latency 40.17% and 34.05% compared to the lookup table and DM4T, respectively. Additionally, there were notable increases in average throughput of 7.48% and 5.66%.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105091"},"PeriodicalIF":1.9,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel lightweight multi-factor authentication scheme for MQTT-based IoT applications 基于 MQTT 的物联网应用的新型轻量级多因素身份验证方案

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems

Pub Date : 2024-08-30 DOI: 10.1016/j.micpro.2024.105088

Manasha Saqib , Ayaz Hassan Moon

The present authentication solutions employed in the Internet of Things (IoT) are either inadequate or computationally intensive, given the resource-constrained nature of IoT devices. This challenges the researchers to devise efficient solutions to embed an important security tenet like authentication. In IoT, the most popular machine-to-machine communication protocol used at the application layer is Message Queuing Telemetry Transport (MQTT). However, the MQTT protocol inherently lacks security-related functions, like authentication, authorization, confidentiality, access control, and data integrity, which is unacceptable for IoT-driven mission-critical applications when connected over public networks. In such a situation, the security is hardened by employing a transport layer security protocol like TLS, which entails significant computational overheads. This paper presents a novel scheme to enhance MQTT security by providing a lightweight multi-factor authentication scheme based on Elliptical curve cryptography. The proposed scheme uses a low-cost signature and a fuzzy extractor to correct errors in imprinted biometrics in noisy environments. This scheme attains mutual authentication, generates a securely agreed-upon session key for secret communication, and guarantees perfect forward secrecy. Furthermore, the rigorous informal security analysis shows the proposed scheme resists cryptographic attacks, including known session critical attacks. Furthermore, an empirical study has been carried out to assess the effectiveness of the proposed scheme in the Cooja simulated environment.

鉴于物联网（IoT）设备资源有限的特性，目前物联网（IoT）中采用的身份验证解决方案要么不充分，要么计算密集。这就对研究人员提出了挑战，他们需要设计出高效的解决方案来嵌入像身份验证这样重要的安全原则。在物联网中，应用层最常用的机器对机器通信协议是消息队列遥测传输（MQTT）。然而，MQTT 协议本身缺乏与安全相关的功能，如身份验证、授权、保密性、访问控制和数据完整性，这对于通过公共网络连接的物联网关键任务应用来说是不可接受的。在这种情况下，需要采用 TLS 等传输层安全协议来加强安全性，但这需要大量的计算开销。本文提出了一种新方案，通过提供基于椭圆曲线加密法的轻量级多因素身份验证方案来增强 MQTT 的安全性。所提出的方案使用低成本签名和模糊提取器来纠正嘈杂环境中印记生物识别的错误。该方案实现了相互验证，生成了用于秘密通信的安全商定的会话密钥，并保证了完美的前向保密性。此外，严格的非正式安全分析表明，所提出的方案可以抵御密码攻击，包括已知的会话临界攻击。此外，还在 Cooja 模拟环境中进行了实证研究，以评估所提方案的有效性。

{"title":"A novel lightweight multi-factor authentication scheme for MQTT-based IoT applications","authors":"Manasha Saqib , Ayaz Hassan Moon","doi":"10.1016/j.micpro.2024.105088","DOIUrl":"10.1016/j.micpro.2024.105088","url":null,"abstract":"<div><p>The present authentication solutions employed in the Internet of Things (IoT) are either inadequate or computationally intensive, given the resource-constrained nature of IoT devices. This challenges the researchers to devise efficient solutions to embed an important security tenet like <em>authentication</em>. In IoT, the most popular machine-to-machine communication protocol used at the application layer is <em>Message Queuing Telemetry Transport (MQTT)</em>. However, the MQTT protocol inherently lacks security-related functions, like <em>authentication, authorization, confidentiality, access control,</em> and <em>data integrity</em>, which is unacceptable for IoT-driven mission-critical applications when connected over public networks. In such a situation, the security is hardened by employing a transport layer security protocol like TLS, which entails significant computational overheads. This paper presents a novel scheme to enhance MQTT security by providing a lightweight multi-factor authentication scheme based on Elliptical curve cryptography. The proposed scheme uses a low-cost signature and a fuzzy extractor to correct errors in imprinted biometrics in noisy environments. This scheme attains mutual authentication, generates a securely agreed-upon session key for secret communication, and guarantees perfect forward secrecy. Furthermore, the rigorous informal security analysis shows the proposed scheme resists cryptographic attacks, including known session critical attacks. Furthermore, an empirical study has been carried out to assess the effectiveness of the proposed scheme in the Cooja simulated environment.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105088"},"PeriodicalIF":1.9,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142163118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Test generation algorithm for QCA circuits targeting novel defects and its corresponding fault models 针对新型缺陷的 QCA 电路测试生成算法及其相应的故障模型

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems

Pub Date : 2024-08-30 DOI: 10.1016/j.micpro.2024.105090

Vaishali Dhare, Usha Mehta

Considering the scaling limitations of current Complementary Metal Oxide Semiconductor (CMOS) technology, Quantum-dot-Cellular Automata (QCA) is emerging as one of the alternatives. QCA being at the molecular scale, defects are more likely to occur in it. Therefore, substantial development of QCA-oriented defects, its corresponding fault models and test generation is required. In this paper, a test generation algorithm for a QCA combinational circuit is proposed. The FAN (A Fanout Oriented) test generation algorithm is extended for QCA. The proposed Automatic Test Pattern Generator (ATPG) for QCA targets Single Stuck at Fault (SSF) set produced by novel Multiple Missing Cells (MMC) defects. The proposed ATPG is based on the QCA-oriented test generation properties and guided by proposed testability measures.

The MCNC benchmark circuits are synthesized into QCA using proposed synthesis algorithms to check the effectiveness of the proposed ATPG. The ATPG is developed using C++ and tested on MCNC benchmark circuits. Further, ATPG-generated test vectors are validated at the QCA device level to demonstrate their correctness. The QCADesigner-E tool is used for the device-level implementation of the MCNC benchmark circuit.

考虑到当前互补金属氧化物半导体（CMOS）技术的扩展限制，量子点蜂窝自动机（QCA）正成为替代技术之一。QCA 处于分子尺度，更容易出现缺陷。因此，需要大量开发面向 QCA 的缺陷、相应的故障模型和测试生成。本文提出了一种 QCA 组合电路的测试生成算法。针对 QCA 扩展了 FAN（面向扇出）测试生成算法。所提出的 QCA 自动测试模式生成器（ATPG）针对的是由新型多缺失单元（MMC）缺陷产生的单故障（SSF）集。提议的 ATPG 基于面向 QCA 的测试生成特性，并以提议的可测试性措施为指导。使用提议的合成算法将 MCNC 基准电路合成为 QCA，以检查提议的 ATPG 的有效性。ATPG 使用 C++ 开发，并在 MCNC 基准电路上进行了测试。此外，还在 QCA 器件级验证了 ATPG 生成的测试向量，以证明其正确性。QCADesigner-E 工具用于 MCNC 基准电路的器件级实现。

{"title":"Test generation algorithm for QCA circuits targeting novel defects and its corresponding fault models","authors":"Vaishali Dhare, Usha Mehta","doi":"10.1016/j.micpro.2024.105090","DOIUrl":"10.1016/j.micpro.2024.105090","url":null,"abstract":"<div><p>Considering the scaling limitations of current Complementary Metal Oxide Semiconductor (CMOS) technology, Quantum-dot-Cellular Automata (QCA) is emerging as one of the alternatives. QCA being at the molecular scale, defects are more likely to occur in it. Therefore, substantial development of QCA-oriented defects, its corresponding fault models and test generation is required. In this paper, a test generation algorithm for a QCA combinational circuit is proposed. The FAN (A Fanout Oriented) test generation algorithm is extended for QCA. The proposed Automatic Test Pattern Generator (ATPG) for QCA targets Single Stuck at Fault (SSF) set produced by novel Multiple Missing Cells (MMC) defects. The proposed ATPG is based on the QCA-oriented test generation properties and guided by proposed testability measures.</p><p>The MCNC benchmark circuits are synthesized into QCA using proposed synthesis algorithms to check the effectiveness of the proposed ATPG. The ATPG is developed using C++ and tested on MCNC benchmark circuits. Further, ATPG-generated test vectors are validated at the QCA device level to demonstrate their correctness. The QCADesigner-E tool is used for the device-level implementation of the MCNC benchmark circuit.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105090"},"PeriodicalIF":1.9,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimized k-Nearest neighbors search implementation on resource-constrained FPGA platforms 在资源受限的 FPGA 平台上实现优化的 k 近邻搜索

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems

Pub Date : 2024-08-10 DOI: 10.1016/j.micpro.2024.105089

Sandra Djosic, Milica Jovanovic, Goran Lj. Djordjevic

The k-Nearest Neighbors (kNN) algorithm is a fundamental machine learning classification technique with wide-ranging applications. Among various kNN implementation choices, FPGA-based heterogeneous systems have gained popularity due to FPGA's inherent parallelism, energy efficiency, and reconfigurability. However, implementing the kNN algorithm on resource-constrained embedded FPGA platforms, typically characterized by constrained programmable resources shared among various application-specific hardware units, necessitates a kNN accelerator architecture that balances high performance, hardware efficiency, and flexibility. To address this challenge, in this paper, we present a kNN hardware accelerator unit designed to optimize resource utilization by utilizing sequential, i.e. accumulation-based, instead of pipelined/parallel distance computations. The proposed architecture incorporates two key algorithmic optimizations to reduce the iteration count of the sequential distance computation loop: a dynamic lower bound enabling early termination of the distance computation and an online element selection that maximizes partial distance growth per iteration. We further enhance the accelerator's performance by incorporating multiple optimized sequential distance computation units, each dedicated to processing a segment of the training dataset. Our experiments demonstrate that the proposed approach is scalable, making it applicable to various hardware platforms and resource constraints. In particular, when implemented on an AMD Zynq device, the proposed single-core kNN accelerator occupies a mere 5 % of the FPGA's resources while delivering a speedup of 3 – 5 times compared to the kNN software implementation running on the accompanying ARM A9 processor. For the 8-core kNN accelerator, the resource utilization stands at 30 $%$ , while the speedup factor ranges between 25 and 35.

k-Nearest Neighbors（kNN）算法是一种基本的机器学习分类技术，应用广泛。在各种 kNN 实现选择中，基于 FPGA 的异构系统因其固有的并行性、能效和可重构性而广受欢迎。然而，在资源受限的嵌入式 FPGA 平台上实施 kNN 算法，通常需要在各种特定应用硬件单元之间共享受限的可编程资源，这就需要一种兼顾高性能、硬件效率和灵活性的 kNN 加速器架构。为了应对这一挑战，我们在本文中提出了一种 kNN 硬件加速器单元，旨在通过利用顺序计算（即基于累加的计算）而不是流水线/并行距离计算来优化资源利用率。所提出的架构包含两个关键的算法优化，以减少顺序距离计算循环的迭代次数：一个是动态下限，使距离计算提前终止；另一个是在线元素选择，使每次迭代的部分距离增长最大化。通过整合多个优化的顺序距离计算单元，我们进一步提高了加速器的性能，每个单元专门用于处理训练数据集的一个片段。我们的实验证明，所提出的方法具有可扩展性，使其适用于各种硬件平台和资源限制。特别是，在 AMD Zynq 设备上实施时，建议的单核 kNN 加速器仅占用 FPGA 资源的 5%，而与在配套 ARM A9 处理器上运行的 kNN 软件实施相比，速度提高了 3 - 5 倍。对于 8 核 kNN 加速器，资源利用率为 30%，而速度提升系数在 25 到 35 之间。

{"title":"Optimized k-Nearest neighbors search implementation on resource-constrained FPGA platforms","authors":"Sandra Djosic, Milica Jovanovic, Goran Lj. Djordjevic","doi":"10.1016/j.micpro.2024.105089","DOIUrl":"10.1016/j.micpro.2024.105089","url":null,"abstract":"<div><p>The k-Nearest Neighbors (kNN) algorithm is a fundamental machine learning classification technique with wide-ranging applications. Among various kNN implementation choices, FPGA-based heterogeneous systems have gained popularity due to FPGA's inherent parallelism, energy efficiency, and reconfigurability. However, implementing the kNN algorithm on resource-constrained embedded FPGA platforms, typically characterized by constrained programmable resources shared among various application-specific hardware units, necessitates a kNN accelerator architecture that balances high performance, hardware efficiency, and flexibility. To address this challenge, in this paper, we present a kNN hardware accelerator unit designed to optimize resource utilization by utilizing sequential, i.e. accumulation-based, instead of pipelined/parallel distance computations. The proposed architecture incorporates two key algorithmic optimizations to reduce the iteration count of the sequential distance computation loop: a dynamic lower bound enabling early termination of the distance computation and an online element selection that maximizes partial distance growth per iteration. We further enhance the accelerator's performance by incorporating multiple optimized sequential distance computation units, each dedicated to processing a segment of the training dataset. Our experiments demonstrate that the proposed approach is scalable, making it applicable to various hardware platforms and resource constraints. In particular, when implemented on an AMD Zynq device, the proposed single-core kNN accelerator occupies a mere 5 % of the FPGA's resources while delivering a speedup of 3 – 5 times compared to the kNN software implementation running on the accompanying ARM A9 processor. For the 8-core kNN accelerator, the resource utilization stands at 30 <span><math><mo>%</mo></math></span>, while the speedup factor ranges between 25 and 35.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105089"},"PeriodicalIF":1.9,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mixture-of-Rookies: Saving DNN computations by predicting ReLU outputs Mixture-of-Rookies：通过预测 ReLU 输出节省 DNN 计算量

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems

Pub Date : 2024-07-30 DOI: 10.1016/j.micpro.2024.105087

Dennis Pinto, Jose-María Arnau, Marc Riera, Josep-Llorenç Cruz, Antonio González

Deep Neural Networks (DNNs) are widely used in many application domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our predictor, named Mixture-of-Rookies, combines two inexpensive components. The first one exploits the high linear correlation between binarized (1-bit) and full-precision (8-bit) dot products, whereas the second component clusters together neurons that tend to output zero at the same time. We propose a novel clustering scheme based on analysis of angles, as the sign of the dot product of two vectors depends on the cosine of the angle between them. We implement our hybrid zero output predictor on top of a state-of-the-art DNN accelerator. Experimental results show that our scheme introduces a small area overhead of 5.3% while achieving a speedup of 1.2x and reducing energy consumption by 16.5% on average for a set of diverse DNNs.

深度神经网络（DNN）被广泛应用于许多应用领域。然而，它们需要大量的计算和内存访问才能提供出色的准确性。在本文中，我们提出了一种方案，用于预测每个 ReLu 激活神经元的输出是零还是正数，从而跳过那些可能输出零的神经元的计算。我们的预测器被命名为，结合了两个廉价的组件。第一个部分利用了二值化（1 位）和全精度（8 位）点积之间的高度线性相关性，而第二个部分则将倾向于同时输出零的神经元聚类在一起。我们提出了一种基于角度分析的新型聚类方案，因为两个向量点积的符号取决于它们之间角度的余弦值。我们在最先进的 DNN 加速器上实现了混合零输出预测器。实验结果表明，对于一组不同的 DNN，我们的方案引入了 5.3% 的小面积开销，同时实现了 1.2 倍的速度提升，并将能耗平均降低了 16.5%。

{"title":"Mixture-of-Rookies: Saving DNN computations by predicting ReLU outputs","authors":"Dennis Pinto, Jose-María Arnau, Marc Riera, Josep-Llorenç Cruz, Antonio González","doi":"10.1016/j.micpro.2024.105087","DOIUrl":"10.1016/j.micpro.2024.105087","url":null,"abstract":"<div><p>Deep Neural Networks (DNNs) are widely used in many application domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our predictor, named <em>Mixture-of-Rookies</em>, combines two inexpensive components. The first one exploits the high linear correlation between binarized (1-bit) and full-precision (8-bit) dot products, whereas the second component clusters together neurons that tend to output zero at the same time. We propose a novel clustering scheme based on analysis of angles, as the sign of the dot product of two vectors depends on the cosine of the angle between them. We implement our hybrid zero output predictor on top of a state-of-the-art DNN accelerator. Experimental results show that our scheme introduces a small area overhead of 5.3% while achieving a speedup of 1.2x and reducing energy consumption by 16.5% on average for a set of diverse DNNs.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105087"},"PeriodicalIF":1.9,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000826/pdfft?md5=f3e30ee4d950e1c93554e32d04ba1b80&pid=1-s2.0-S0141933124000826-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PyIgH : A unified architecture of IgH EtherCAT Master based on Python considering hard real-time constraints PyIgH：基于 Python 的 IgH EtherCAT 主站统一架构，考虑硬实时约束条件

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems

Pub Date : 2024-07-19 DOI: 10.1016/j.micpro.2024.105085

Raimarius Delgado , Se Yeon Cho , Byoung Wook Choi

The increasing demand for rapid application development tools, especially those employing high-level languages such as Python, has underscored the importance of utilizing a wide array of popular libraries while addressing real-time constraints in distributed hardware systems. This paper introduces PyIgH, a unified architecture of an IgH EtherCAT master based on Python, specifically designed to satisfy hard real-time requirements in an EtherCAT network. Implemented as a Python module, PyIgH exposes the functionalities and capabilities of an open-source EtherCAT master, facilitating seamless configuration and control of EtherCAT slave devices within the Python runtime environment. Real-time adaptation of the POSIX library, encapsulated within Python, is also utilized to satisfy the timing requirements of EtherCAT. The feasibility of the proposed approach is verified by analyzing the real-time performance in terms of periodicity and in-controller delay of the EtherCAT control task with a 1 kHz cycle. Experimental results demonstrate that PyIgH is suitable for hard real-time applications and serves as a valid alternative to conventional low-level EtherCAT masters. Additionally, a practical application involving motion control of a six-axis collaborative robot showcases consistent performance of PyIgH within a real-time multi-tasking environment.

对快速应用开发工具的需求日益增长，尤其是那些采用 Python 等高级语言的工具，这凸显了在解决分布式硬件系统中的实时性限制的同时利用各种流行库的重要性。本文介绍的 PyIgH 是一种基于 Python 的 IgH EtherCAT 主站统一架构，专门用于满足 EtherCAT 网络中的硬实时性要求。PyIgH 以 Python 模块的形式实现，公开了开源 EtherCAT 主站的功能和能力，便于在 Python 运行环境中对 EtherCAT 从站设备进行无缝配置和控制。此外，还利用封装在 Python 中的 POSIX 库进行实时调整，以满足 EtherCAT 的定时要求。通过分析周期为 1 kHz 的 EtherCAT 控制任务在周期性和控制器内延迟方面的实时性能，验证了所提方法的可行性。实验结果表明，PyIgH 适用于硬实时应用，是传统低级 EtherCAT 主站的有效替代方案。此外，一个涉及六轴协作机器人运动控制的实际应用展示了 PyIgH 在实时多任务环境中的稳定性能。

{"title":"PyIgH : A unified architecture of IgH EtherCAT Master based on Python considering hard real-time constraints","authors":"Raimarius Delgado , Se Yeon Cho , Byoung Wook Choi","doi":"10.1016/j.micpro.2024.105085","DOIUrl":"10.1016/j.micpro.2024.105085","url":null,"abstract":"<div><p>The increasing demand for rapid application development tools, especially those employing high-level languages such as Python, has underscored the importance of utilizing a wide array of popular libraries while addressing real-time constraints in distributed hardware systems. This paper introduces PyIgH, a unified architecture of an IgH EtherCAT master based on Python, specifically designed to satisfy hard real-time requirements in an EtherCAT network. Implemented as a Python module, PyIgH exposes the functionalities and capabilities of an open-source EtherCAT master, facilitating seamless configuration and control of EtherCAT slave devices within the Python runtime environment. Real-time adaptation of the POSIX library, encapsulated within Python, is also utilized to satisfy the timing requirements of EtherCAT. The feasibility of the proposed approach is verified by analyzing the real-time performance in terms of periodicity and in-controller delay of the EtherCAT control task with a 1 kHz cycle. Experimental results demonstrate that PyIgH is suitable for hard real-time applications and serves as a valid alternative to conventional low-level EtherCAT masters. Additionally, a practical application involving motion control of a six-axis collaborative robot showcases consistent performance of PyIgH within a real-time multi-tasking environment.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105085"},"PeriodicalIF":1.9,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Full wireless goniometer design with activity recognition for upper and lower limb 全无线动态关节角度计设计，具有上下肢活动识别功能

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems

Pub Date : 2024-07-17 DOI: 10.1016/j.micpro.2024.105086

Cemil Keskinoğlu , Ahmet Aydın

People must move using their lower and upper extremities to complete their work. Depending on these extremities' using frequency or different effects such as age, genetics, and body weight, the extremities' ability may decrease. The joints' range of motion(ROM) is measured to evaluate this decrease. Different systems, such as conventional goniometers, mobile phone applications, and sensor-based systems, can measure the ROM value. Still, it can be challenging to measure this parameter in different situations, such as training, moving activities, etc. The partial wireless goniometer and a companion 3D visualization and control GUI were developed in our previous study. However, it was difficult to mount it on the limbs at a distance, or it was impossible to use it for both legs to measure the hip angles. Therefore, this study presents a full wireless goniometer system that can simultaneously measure in real-time and show joint movements in a 3D model for the upper and lower extremities. The angle values required for the ROM were measured with two IMU sensors. Two ESP32s were used as microcontrollers in the system, and a fully wireless system was enabled by transferring data via ESP-NOW and Bluetooth. Thanks to ESP-NOW, the system has less latency compared to other protocols and can transmit data over longer distances. The developed system can also perform activity recognition which is not available in other goniometers. The measurements of the system were compared with a conventional goniometer, and their results were found to be completely correlated $(ρ_{c} = 1)$ .

人们必须通过上下肢的运动来完成工作。根据这些肢体的使用频率或不同的影响（如年龄、遗传和体重），肢体的能力可能会下降。测量关节的活动范围（ROM）就是为了评估这种下降。不同的系统，如传统的动态关节角度计、手机应用程序和基于传感器的系统，都可以测量 ROM 值。不过，在训练、活动等不同情况下测量这一参数仍具有挑战性。我们在之前的研究中开发了部分无线动态关节角度计和配套的三维可视化控制图形用户界面。然而，将其安装在远距离的肢体上存在困难，或者无法用于双腿测量髋关节角度。因此，本研究提出了一种全无线动态关节角度计系统，可同时实时测量上下肢的关节运动并在三维模型中显示。ROM 所需的角度值由两个 IMU 传感器测量。系统中使用了两个 ESP32 作为微控制器，通过 ESP-NOW 和蓝牙传输数据，实现了全无线系统。与其他协议相比，ESP-NOW 使系统的延迟时间更短，数据传输距离更远。开发的系统还能进行活动识别，这是其他动态关节角度计所不具备的。该系统的测量结果与传统的动态关节角度计进行了比较，发现两者的测量结果完全相关。

{"title":"Full wireless goniometer design with activity recognition for upper and lower limb","authors":"Cemil Keskinoğlu , Ahmet Aydın","doi":"10.1016/j.micpro.2024.105086","DOIUrl":"10.1016/j.micpro.2024.105086","url":null,"abstract":"<div><p>People must move using their lower and upper extremities to complete their work. Depending on these extremities' using frequency or different effects such as age, genetics, and body weight, the extremities' ability may decrease. The joints' range of motion(ROM) is measured to evaluate this decrease. Different systems, such as conventional goniometers, mobile phone applications, and sensor-based systems, can measure the ROM value. Still, it can be challenging to measure this parameter in different situations, such as training, moving activities, etc. The partial wireless goniometer and a companion 3D visualization and control GUI were developed in our previous study. However, it was difficult to mount it on the limbs at a distance, or it was impossible to use it for both legs to measure the hip angles. Therefore, this study presents a full wireless goniometer system that can simultaneously measure in real-time and show joint movements in a 3D model for the upper and lower extremities. The angle values required for the ROM were measured with two IMU sensors. Two ESP32s were used as microcontrollers in the system, and a fully wireless system was enabled by transferring data via ESP-NOW and Bluetooth. Thanks to ESP-NOW, the system has less latency compared to other protocols and can transmit data over longer distances. The developed system can also perform activity recognition which is not available in other goniometers. The measurements of the system were compared with a conventional goniometer, and their results were found to be completely correlated <span><math><mrow><mo>(</mo><mrow><msub><mi>ρ</mi><mi>c</mi></msub><mo>=</mo><mn>1</mn></mrow><mo>)</mo></mrow></math></span>.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105086"},"PeriodicalIF":1.9,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141775424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Count overflow and privilege mode filtering extension implementation on a RISC-V on-board processor 在 RISC-V 板载处理器上实现计数溢出和特权模式过滤扩展

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems

Pub Date : 2024-07-14 DOI: 10.1016/j.micpro.2024.105084

Andrea Fernández Gallego, Miguel Jiménez Arribas, Iván Gamino del Río, Agustín Martínez Hellín, Manuel Prieto Mateo, Óscar Rodríguez Polo, Antonio da Silva, Pablo Parra, Sebastián Sánchez

RISC-V is a computer architecture that has recently attracted considerable attention due to its advantageous qualities: it is an open instruction set, based on reduced and simple instructions. For this reason it has become an appealing choice for a wide range of computing applications and has positioned it as a disruptive force in a wide variety of fields, including those that involve the development of safety–critical software, as in the space sector. The ability to evaluate the activities performed within a processor is of paramount importance in this type of systems to ensure the fulfillment of the requirements during space missions. The monitoring of these events inside the processor is managed by an instrument called Hardware Performance Monitor (HPM). This work shows the implementation of the Sscofpmf extension of the HPM compliant to the RISC-V privileged specification. The paper details the redesign of the existing performance counters from a RISC-V baseline version previously implemented. A comparison between the two versions of both resource utilization data and power consumption is also provided. As expected, the Sscofpmf extension version has a higher resource utilization. Nevertheless, the paper shows that the additional functionalities included in the system have been validated without any changes in the processor clock frequency, so the extension does not introduce any performance overhead.

RISC-V 是一种计算机体系结构，最近因其优点而备受关注：它是一种开放式指令集，以精简的简单指令为基础。因此，它已成为各种计算应用的理想选择，并在众多领域成为一股颠覆性的力量，包括那些涉及安全关键软件开发的领域，如太空领域。在这类系统中，评估处理器内部活动的能力至关重要，可确保满足太空任务的要求。对处理器内部这些活动的监控由一种名为硬件性能监控器（HPM）的仪器进行管理。这项工作展示了符合 RISC-V 特权规范的 HPM 的 Sscofpmf 扩展实现。论文详细介绍了在先前实施的 RISC-V 基准版本基础上对现有性能计数器的重新设计。论文还对两个版本的资源利用率数据和功耗进行了比较。不出所料，Sscofpmf 扩展版本的资源利用率更高。不过，本文表明，系统中包含的附加功能已通过验证，处理器时钟频率没有任何变化，因此扩展版本不会带来任何性能开销。

{"title":"Count overflow and privilege mode filtering extension implementation on a RISC-V on-board processor","authors":"Andrea Fernández Gallego, Miguel Jiménez Arribas, Iván Gamino del Río, Agustín Martínez Hellín, Manuel Prieto Mateo, Óscar Rodríguez Polo, Antonio da Silva, Pablo Parra, Sebastián Sánchez","doi":"10.1016/j.micpro.2024.105084","DOIUrl":"10.1016/j.micpro.2024.105084","url":null,"abstract":"<div><p><em>RISC-V</em> is a computer architecture that has recently attracted considerable attention due to its advantageous qualities: it is an open instruction set, based on reduced and simple instructions. For this reason it has become an appealing choice for a wide range of computing applications and has positioned it as a disruptive force in a wide variety of fields, including those that involve the development of safety–critical software, as in the space sector. The ability to evaluate the activities performed within a processor is of paramount importance in this type of systems to ensure the fulfillment of the requirements during space missions. The monitoring of these events inside the processor is managed by an instrument called Hardware Performance Monitor (HPM). This work shows the implementation of the <em>Sscofpmf</em> extension of the HPM compliant to the <em>RISC-V</em> privileged specification. The paper details the redesign of the existing performance counters from a <em>RISC-V</em> baseline version previously implemented. A comparison between the two versions of both resource utilization data and power consumption is also provided. As expected, the <em>Sscofpmf</em> extension version has a higher resource utilization. Nevertheless, the paper shows that the additional functionalities included in the system have been validated without any changes in the processor clock frequency, so the extension does not introduce any performance overhead.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105084"},"PeriodicalIF":1.9,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000796/pdfft?md5=db2cd71fd8fabeee87eb0b479d1b76cc&pid=1-s2.0-S0141933124000796-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141699792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Formal timing analysis of gate-level digital circuits using model checking 利用模型检查对门级数字电路进行正式时序分析

IF 1.9 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems

Pub Date : 2024-06-28 DOI: 10.1016/j.micpro.2024.105083

Qurat-ul Ain, Osman Hasan

Due to the continuous reduction in the transistors sizing ruled by the Moore’s law, digital devices have become smaller, and more complex resulting in an enormous rise in the delay variations. Therefore, there is a dire need of precise and rigorous timing analysis to overcome anomalies during the timing analysis. Timings of digital circuits can be verified using various simulation or static timing analysis (STA) based tools but they provide estimated results due to their inherent in-exhaustive nature or report timing paths corresponding to non-existent functional paths, respectively. Formal verification provides complete and sound analysis results and has widely been used for the functional verification of digital circuits but its application in the timing analysis domain is somewhat limited. We present a generic framework to perform formal timing analysis of digital circuits with the help of Uppaal model-checker. The given digital circuit along with its timing parameters in the form of state transition diagram are modeled using timed automata in the Uppaal model checker. Timing delays are calculated from corresponding technology parameters, and Quartus Prime Pro is used to obtain the information about the circuits’ paths. In order to make the analysis scalable, we also propose a novel path partitioning technique and compare its results with complete path analysis and traditional STA. The formal model is verified with the help of properties to assess the timing characteristics, like time period of a clock, critical path, and propagation delay of the considered circuit. Modeling and verification of ISCAS-85 and ISCAS-89 benchmark circuits is presented for illustration purposes.

由于摩尔定律规定的晶体管尺寸不断缩小，数字设备变得越来越小、越来越复杂，导致延迟变化大幅上升。因此，亟需进行精确、严格的时序分析，以克服时序分析过程中的异常现象。数字电路的时序可使用各种基于仿真或静态时序分析 (STA) 的工具进行验证，但由于其固有的不穷尽性，这些工具只能提供估计结果，或分别报告与不存在的功能路径相对应的时序路径。形式验证可提供完整、可靠的分析结果，已广泛用于数字电路的功能验证，但在时序分析领域的应用却受到一定限制。在 Uppaal 模型检查器的帮助下，我们提出了一个对数字电路进行形式时序分析的通用框架。在 Uppaal 模型检查器中，使用定时自动机对给定的数字电路及其状态转换图形式的时序参数进行建模。根据相应的技术参数计算时序延迟，并使用 Quartus Prime Pro 获取电路路径信息。为了使分析具有可扩展性，我们还提出了一种新颖的路径分割技术，并将其结果与完整路径分析和传统的 STA 进行了比较。正式模型借助属性进行验证，以评估所考虑电路的时序特性，如时钟周期、临界路径和传播延迟。为说明起见，介绍了 ISCAS-85 和 ISCAS-89 基准电路的建模和验证。

{"title":"Formal timing analysis of gate-level digital circuits using model checking","authors":"Qurat-ul Ain, Osman Hasan","doi":"10.1016/j.micpro.2024.105083","DOIUrl":"https://doi.org/10.1016/j.micpro.2024.105083","url":null,"abstract":"<div><p>Due to the continuous reduction in the transistors sizing ruled by the Moore’s law, digital devices have become smaller, and more complex resulting in an enormous rise in the delay variations. Therefore, there is a dire need of precise and rigorous timing analysis to overcome anomalies during the timing analysis. Timings of digital circuits can be verified using various simulation or static timing analysis (STA) based tools but they provide estimated results due to their inherent in-exhaustive nature or report timing paths corresponding to non-existent functional paths, respectively. Formal verification provides complete and sound analysis results and has widely been used for the functional verification of digital circuits but its application in the timing analysis domain is somewhat limited. We present a generic framework to perform formal timing analysis of digital circuits with the help of Uppaal model-checker. The given digital circuit along with its timing parameters in the form of state transition diagram are modeled using timed automata in the Uppaal model checker. Timing delays are calculated from corresponding technology parameters, and Quartus Prime Pro is used to obtain the information about the circuits’ paths. In order to make the analysis scalable, we also propose a novel path partitioning technique and compare its results with complete path analysis and traditional STA. The formal model is verified with the help of properties to assess the timing characteristics, like time period of a clock, critical path, and propagation delay of the considered circuit. Modeling and verification of ISCAS-85 and ISCAS-89 benchmark circuits is presented for illustration purposes.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105083"},"PeriodicalIF":1.9,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0