首页 > 最新文献

Microprocessors and Microsystems最新文献

英文 中文
RED-SEA Project: Towards a new-generation European interconnect RED-SEA 项目:建立新一代欧洲互连网
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-16 DOI: 10.1016/j.micpro.2024.105102
Maria Engracia Gomez , Julio Sahuquillo , Andrea Biagioni , Nikos Chrysos , Damien Berton , Ottorino Frezza , Francesca Lo Cicero , Alessandro Lonardo , Michele Martinelli , Pier Stanislao Paolucci , Elena Pastorelli , Francesco Simula , Matteo Turisini , Piero Vicini , Roberto Ammendola , Carlotta Chiarini , Chiara De Luca , Fabrizio Capuani , Adrián Castelló , Jose Duro , Simon Pickartz
RED-SEA is a H2020 EuroHPC project, whose main objective is to prepare a new-generation European Interconnect, capable of powering the EU Exascale systems to come, through an economically viable and technologically efficient interconnect, leveraging European interconnect technology (BXI) associated with standard and mature technology (Ethernet), previous EU-funded initiatives, as well as open standards and compatible APIs.
To achieve this objective, the RED-SEA project is being carried out around four key pillars: (i) network architecture and workload requirements-interconnects co-design – aiming at optimizing the fit with the other EuroHPC projects and with the EPI processors; (ii) development of a high-performance, low-latency, seamless bridge with Ethernet; (iii) efficient network resource management, including congestion and Quality-of-Service; and (iv) end-to-end functions implemented at the network edges.
This paper presents key achievements and results at the midterm of the project for each key pillar in the way to reach the final project objective. In this regard we can highlight: (i) The definition of the network requirements and architecture as well as a list of benchmarks and applications; (ii) In addition to initially planned IPs progress, BXI3 architecture has evolved to support natively Ethernet at low level, resulting in reduced complexity, with advantages in terms of cost optimization, and power consumption; (iii) The congestion characterization of target applications and proposals to reduce this congestion by the optimization of collective communication primitives, injection throttling and adaptive routing; and (iv) the low-latency high-message rate endpoint functions and their connection with new open technologies.
RED-SEA 是一个 H2020 EuroHPC 项目,其主要目标是利用与标准成熟技术(以太网)相关的欧洲互联技术(BXI)、以前的欧盟资助计划以及开放标准和兼容 API,通过经济上可行、技术上高效的互联技术,为新一代欧洲互联技术做好准备,使其能够为未来的欧盟超大规模系统提供动力。为实现这一目标,RED-SEA 项目围绕四个关键支柱展开:(i) 网络架构和工作负载要求--互连协同设计--旨在优化与其他 EuroHPC 项目和 EPI 处理器的匹配;(ii) 开发高性能、低延迟、与以太网无缝连接的桥接器;(iii) 高效网络资源管理,包括拥塞和服务质量;(iv) 在网络边缘实现端到端功能。本文介绍了在实现项目最终目标的过程中,每个关键支柱在项目中期取得的主要成就和成果。在这方面,我们可以强调(i) 网络要求和架构的定义,以及基准和应用清单;(ii) 除了最初计划的 IP 进展外,BXI3 架构已发展到在低层次上支持本地以太网,从而降低了复杂性,在成本优化和功耗方面具有优势;(iii) 目标应用的拥塞特征,以及通过优化集体通信基元、注入节流和自适应路由来减少拥塞的建议;以及 (iv) 低延迟高信息速率端点功能及其与新开放技术的连接。
{"title":"RED-SEA Project: Towards a new-generation European interconnect","authors":"Maria Engracia Gomez ,&nbsp;Julio Sahuquillo ,&nbsp;Andrea Biagioni ,&nbsp;Nikos Chrysos ,&nbsp;Damien Berton ,&nbsp;Ottorino Frezza ,&nbsp;Francesca Lo Cicero ,&nbsp;Alessandro Lonardo ,&nbsp;Michele Martinelli ,&nbsp;Pier Stanislao Paolucci ,&nbsp;Elena Pastorelli ,&nbsp;Francesco Simula ,&nbsp;Matteo Turisini ,&nbsp;Piero Vicini ,&nbsp;Roberto Ammendola ,&nbsp;Carlotta Chiarini ,&nbsp;Chiara De Luca ,&nbsp;Fabrizio Capuani ,&nbsp;Adrián Castelló ,&nbsp;Jose Duro ,&nbsp;Simon Pickartz","doi":"10.1016/j.micpro.2024.105102","DOIUrl":"10.1016/j.micpro.2024.105102","url":null,"abstract":"<div><div>RED-SEA is a H2020 EuroHPC project, whose main objective is to prepare a new-generation European Interconnect, capable of powering the EU Exascale systems to come, through an economically viable and technologically efficient interconnect, leveraging European interconnect technology (BXI) associated with standard and mature technology (Ethernet), previous EU-funded initiatives, as well as open standards and compatible APIs.</div><div>To achieve this objective, the RED-SEA project is being carried out around four key pillars: (i) network architecture and workload requirements-interconnects co-design – aiming at optimizing the fit with the other EuroHPC projects and with the EPI processors; (ii) development of a high-performance, low-latency, seamless bridge with Ethernet; (iii) efficient network resource management, including congestion and Quality-of-Service; and (iv) end-to-end functions implemented at the network edges.</div><div>This paper presents key achievements and results at the midterm of the project for each key pillar in the way to reach the final project objective. In this regard we can highlight: (i) The definition of the network requirements and architecture as well as a list of benchmarks and applications; (ii) In addition to initially planned IPs progress, BXI3 architecture has evolved to support natively Ethernet at low level, resulting in reduced complexity, with advantages in terms of cost optimization, and power consumption; (iii) The congestion characterization of target applications and proposals to reduce this congestion by the optimization of collective communication primitives, injection throttling and adaptive routing; and (iv) the low-latency high-message rate endpoint functions and their connection with new open technologies.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105102"},"PeriodicalIF":1.9,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000978/pdfft?md5=078031f75a9ce320a049b03c1e432247&pid=1-s2.0-S0141933124000978-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142314850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent advances in Machine Learning based Advanced Driver Assistance System applications 基于机器学习的高级驾驶辅助系统应用的最新进展
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-12 DOI: 10.1016/j.micpro.2024.105101
Guner Tatar , Salih Bayar , Ihsan Cicek , Smail Niar

In recent years, the rise of traffic in modern cities has demanded novel technology to support the drivers and protect the passengers and other third parties involved in transportation. Thanks to rapid technological progress and innovations, many Advanced Driver Assistance Systems (A/DAS) based on Machine Learning (ML) algorithms have emerged to address the increasing demand for practical A/DAS applications. Fast and accurate execution of A/DAS algorithms is essential for preventing loss of life and property. High-speed hardware accelerators are vital for processing the high volume of data captured by increasingly sophisticated sensors and complex mathematical models’ execution of modern deep learning (DL) algorithms. One of the fundamental challenges in this new era is to design energy-efficient and portable ML-enabled platforms for vehicles to provide driver assistance and safety. This article presents recent progress in ML-driven A/DAS technology to offer new insights for researchers. We covered standard ML models and optimization approaches based on widely accepted open-source frameworks extensively used in A/DAS applications. We have also highlighted related articles on ML and its sub-branches, neural networks (NNs), and DL. We have also reported the implementation issues, bench-marking problems, and potential challenges for future research. Popular embedded hardware platforms such as Field Programmable Gate Arrays (FPGAs), central processing units (CPUs), Graphical Processing Units (GPUs), and Application Specific Integrated Circuits (ASICs) used to implement A/DAS applications are also compared concerning their performance and resource utilization. We have examined the hardware and software development environments used in implementing A/DAS applications and reported their advantages and disadvantages. We provided performance comparisons of usual A/DAS tasks such as traffic sign recognition, road and lane detection, vehicle and pedestrian detection, driver behavior, and multiple tasking. Considering the current research dynamics, A/DAS will remain one of the most popular application fields for vehicular transportation shortly.

近年来,现代城市的交通流量不断增加,这就需要新技术来支持驾驶员,保护乘客和其他参与交通的第三方。得益于快速的技术进步和创新,许多基于机器学习(ML)算法的高级驾驶辅助系统(A/DAS)应运而生,以满足对 A/DAS 实际应用日益增长的需求。快速准确地执行 A/DAS 算法对于防止生命和财产损失至关重要。高速硬件加速器对于处理日益精密的传感器捕获的大量数据和执行现代深度学习(DL)算法的复杂数学模型至关重要。新时代的基本挑战之一是为车辆设计高能效、便携式的人工智能平台,以提供驾驶辅助和安全。本文介绍了 ML 驱动的 A/DAS 技术的最新进展,为研究人员提供了新的见解。我们介绍了标准 ML 模型和优化方法,它们基于广泛应用于 A/DAS 应用的开源框架。我们还重点介绍了有关 ML 及其分支、神经网络 (NN) 和 DL 的相关文章。我们还报告了实施问题、基准问题和未来研究的潜在挑战。我们还比较了用于实现 A/DAS 应用程序的常用嵌入式硬件平台,如现场可编程门阵列 (FPGA)、中央处理器 (CPU)、图形处理器 (GPU) 和专用集成电路 (ASIC),了解它们的性能和资源利用情况。我们研究了用于实施 A/DAS 应用程序的硬件和软件开发环境,并报告了它们的优缺点。我们提供了常见 A/DAS 任务的性能比较,如交通标志识别、道路和车道检测、车辆和行人检测、驾驶员行为和多重任务。考虑到当前的研究动态,A/DAS 在短期内仍将是车辆交通领域最热门的应用领域之一。
{"title":"Recent advances in Machine Learning based Advanced Driver Assistance System applications","authors":"Guner Tatar ,&nbsp;Salih Bayar ,&nbsp;Ihsan Cicek ,&nbsp;Smail Niar","doi":"10.1016/j.micpro.2024.105101","DOIUrl":"10.1016/j.micpro.2024.105101","url":null,"abstract":"<div><p>In recent years, the rise of traffic in modern cities has demanded novel technology to support the drivers and protect the passengers and other third parties involved in transportation. Thanks to rapid technological progress and innovations, many Advanced Driver Assistance Systems (A/DAS) based on Machine Learning (ML) algorithms have emerged to address the increasing demand for practical A/DAS applications. Fast and accurate execution of A/DAS algorithms is essential for preventing loss of life and property. High-speed hardware accelerators are vital for processing the high volume of data captured by increasingly sophisticated sensors and complex mathematical models’ execution of modern deep learning (DL) algorithms. One of the fundamental challenges in this new era is to design energy-efficient and portable ML-enabled platforms for vehicles to provide driver assistance and safety. This article presents recent progress in ML-driven A/DAS technology to offer new insights for researchers. We covered standard ML models and optimization approaches based on widely accepted open-source frameworks extensively used in A/DAS applications. We have also highlighted related articles on ML and its sub-branches, neural networks (NNs), and DL. We have also reported the implementation issues, bench-marking problems, and potential challenges for future research. Popular embedded hardware platforms such as Field Programmable Gate Arrays (FPGAs), central processing units (CPUs), Graphical Processing Units (GPUs), and Application Specific Integrated Circuits (ASICs) used to implement A/DAS applications are also compared concerning their performance and resource utilization. We have examined the hardware and software development environments used in implementing A/DAS applications and reported their advantages and disadvantages. We provided performance comparisons of usual A/DAS tasks such as traffic sign recognition, road and lane detection, vehicle and pedestrian detection, driver behavior, and multiple tasking. Considering the current research dynamics, A/DAS will remain one of the most popular application fields for vehicular transportation shortly.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105101"},"PeriodicalIF":1.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proactive deadlock prevention based on traffic classification sub-graphs for triplet-based NoC TriBA-cNoC 基于流量分类子图的主动死锁预防,适用于基于三胞胎的 NoC TriBA-cNoC
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-31 DOI: 10.1016/j.micpro.2024.105091
Karim Soliman, Shi Feng, Ruan Shengqiang, Chunfeng Li

Network topology and routing algorithms stand as pivotal decision points that profoundly impact the performance of Network-on-Chip (NoC) systems. As core counts rise, so does the inherent competition for shared resources, spotlighting the critical need for meticulously designed routing algorithms that circumvent deadlocks to ensure optimal network efficiency. This research capitalizes on the Triplet-Base Architecture (TriBA) and its Distributed Minimal Routing Algorithm (DM4T) to overcome the limitations of previous approaches. While DM4T exhibits performance advantages over previous routing algorithms, its deterministic nature and potential for circular dependencies during routing can lead to deadlocks and congestion. Therefore, this work addresses these vulnerabilities while leveraging the performance benefits of TriBA and DM4T. This work introduces a novel approach that merges a proactive deadlock prevention mechanism with Intermediate Adjacent Shortest Path Routing (IASPR). This combination guarantees both deadlock-free and livelock-free routing, ensuring reliable communication within the network. The key to this integration lies in a flow model-based data transfer categorization technique. This technique prevents the formation of circular dependencies. Additionally, it reduces redundant distance calculations during the routing process. By addressing these challenges, the proposed approach achieves improvements in both routing latency and throughput. To rigorously assess the performance of TriBA network topologies under varying configurations, extensive simulations were undertaken. The investigation encompassed both TriBA networks comprising 9 nodes and those with 27 nodes, employing DM4T, IASPR routing algorithms, and the proactive deadlock prevention method. The gem5 simulator, operating under the Garnet 3.0 network model using a standalone protocol for synthetic traffic patterns, was utilized for simulations at high injection rates, spanning diverse synthetic traffic patterns and PARSEC benchmark suite applications. Simulations rigorously quantified the effectiveness of the proposed approach, revealing reductions in average latency 40.17% and 34.05% compared to the lookup table and DM4T, respectively. Additionally, there were notable increases in average throughput of 7.48% and 5.66%.

网络拓扑和路由算法是深刻影响片上网络 (NoC) 系统性能的关键决策点。随着内核数量的增加,对共享资源的固有竞争也在加剧,因此迫切需要精心设计的路由算法来规避死锁,以确保最佳的网络效率。本研究利用三重基础架构(TriBA)及其分布式最小路由算法(DM4T)克服了以往方法的局限性。虽然 DM4T 与之前的路由算法相比具有性能优势,但其确定性和路由过程中的潜在循环依赖性可能会导致死锁和拥塞。因此,本研究在利用 TriBA 和 DM4T 性能优势的同时,解决了这些漏洞。这项工作引入了一种新方法,将主动死锁预防机制与中间相邻最短路径路由(IASPR)相结合。这种组合保证了无死锁和无活锁路由,确保了网络内的可靠通信。这种整合的关键在于基于流模型的数据传输分类技术。这种技术可防止形成循环依赖关系。此外,它还能减少路由过程中多余的距离计算。通过应对这些挑战,所提出的方法实现了路由延迟和吞吐量的改善。为了严格评估 TriBA 网络拓扑在不同配置下的性能,我们进行了大量模拟。调查涵盖了由 9 个节点组成的 TriBA 网络和由 27 个节点组成的 TriBA 网络,采用了 DM4T、IASPR 路由算法和主动死锁预防方法。gem5 模拟器在 Garnet 3.0 网络模型下运行,使用合成流量模式的独立协议,以高注入率进行模拟,涵盖各种合成流量模式和 PARSEC 基准套件应用。模拟严格量化了建议方法的有效性,结果显示,与查找表和 DM4T 相比,平均延迟分别降低了 40.17% 和 34.05%。此外,平均吞吐量也显著提高了 7.48% 和 5.66%。
{"title":"Proactive deadlock prevention based on traffic classification sub-graphs for triplet-based NoC TriBA-cNoC","authors":"Karim Soliman,&nbsp;Shi Feng,&nbsp;Ruan Shengqiang,&nbsp;Chunfeng Li","doi":"10.1016/j.micpro.2024.105091","DOIUrl":"10.1016/j.micpro.2024.105091","url":null,"abstract":"<div><p>Network topology and routing algorithms stand as pivotal decision points that profoundly impact the performance of Network-on-Chip (NoC) systems. As core counts rise, so does the inherent competition for shared resources, spotlighting the critical need for meticulously designed routing algorithms that circumvent deadlocks to ensure optimal network efficiency. This research capitalizes on the Triplet-Base Architecture (TriBA) and its Distributed Minimal Routing Algorithm (DM4T) to overcome the limitations of previous approaches. While DM4T exhibits performance advantages over previous routing algorithms, its deterministic nature and potential for circular dependencies during routing can lead to deadlocks and congestion. Therefore, this work addresses these vulnerabilities while leveraging the performance benefits of TriBA and DM4T. This work introduces a novel approach that merges a proactive deadlock prevention mechanism with Intermediate Adjacent Shortest Path Routing (IASPR). This combination guarantees both deadlock-free and livelock-free routing, ensuring reliable communication within the network. The key to this integration lies in a flow model-based data transfer categorization technique. This technique prevents the formation of circular dependencies. Additionally, it reduces redundant distance calculations during the routing process. By addressing these challenges, the proposed approach achieves improvements in both routing latency and throughput. To rigorously assess the performance of TriBA network topologies under varying configurations, extensive simulations were undertaken. The investigation encompassed both TriBA networks comprising 9 nodes and those with 27 nodes, employing DM4T, IASPR routing algorithms, and the proactive deadlock prevention method. The gem5 simulator, operating under the Garnet 3.0 network model using a standalone protocol for synthetic traffic patterns, was utilized for simulations at high injection rates, spanning diverse synthetic traffic patterns and PARSEC benchmark suite applications. Simulations rigorously quantified the effectiveness of the proposed approach, revealing reductions in average latency 40.17% and 34.05% compared to the lookup table and DM4T, respectively. Additionally, there were notable increases in average throughput of 7.48% and 5.66%.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105091"},"PeriodicalIF":1.9,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel lightweight multi-factor authentication scheme for MQTT-based IoT applications 基于 MQTT 的物联网应用的新型轻量级多因素身份验证方案
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-30 DOI: 10.1016/j.micpro.2024.105088
Manasha Saqib , Ayaz Hassan Moon

The present authentication solutions employed in the Internet of Things (IoT) are either inadequate or computationally intensive, given the resource-constrained nature of IoT devices. This challenges the researchers to devise efficient solutions to embed an important security tenet like authentication. In IoT, the most popular machine-to-machine communication protocol used at the application layer is Message Queuing Telemetry Transport (MQTT). However, the MQTT protocol inherently lacks security-related functions, like authentication, authorization, confidentiality, access control, and data integrity, which is unacceptable for IoT-driven mission-critical applications when connected over public networks. In such a situation, the security is hardened by employing a transport layer security protocol like TLS, which entails significant computational overheads. This paper presents a novel scheme to enhance MQTT security by providing a lightweight multi-factor authentication scheme based on Elliptical curve cryptography. The proposed scheme uses a low-cost signature and a fuzzy extractor to correct errors in imprinted biometrics in noisy environments. This scheme attains mutual authentication, generates a securely agreed-upon session key for secret communication, and guarantees perfect forward secrecy. Furthermore, the rigorous informal security analysis shows the proposed scheme resists cryptographic attacks, including known session critical attacks. Furthermore, an empirical study has been carried out to assess the effectiveness of the proposed scheme in the Cooja simulated environment.

鉴于物联网(IoT)设备资源有限的特性,目前物联网(IoT)中采用的身份验证解决方案要么不充分,要么计算密集。这就对研究人员提出了挑战,他们需要设计出高效的解决方案来嵌入像身份验证这样重要的安全原则。在物联网中,应用层最常用的机器对机器通信协议是消息队列遥测传输(MQTT)。然而,MQTT 协议本身缺乏与安全相关的功能,如身份验证、授权、保密性、访问控制和数据完整性,这对于通过公共网络连接的物联网关键任务应用来说是不可接受的。在这种情况下,需要采用 TLS 等传输层安全协议来加强安全性,但这需要大量的计算开销。本文提出了一种新方案,通过提供基于椭圆曲线加密法的轻量级多因素身份验证方案来增强 MQTT 的安全性。所提出的方案使用低成本签名和模糊提取器来纠正嘈杂环境中印记生物识别的错误。该方案实现了相互验证,生成了用于秘密通信的安全商定的会话密钥,并保证了完美的前向保密性。此外,严格的非正式安全分析表明,所提出的方案可以抵御密码攻击,包括已知的会话临界攻击。此外,还在 Cooja 模拟环境中进行了实证研究,以评估所提方案的有效性。
{"title":"A novel lightweight multi-factor authentication scheme for MQTT-based IoT applications","authors":"Manasha Saqib ,&nbsp;Ayaz Hassan Moon","doi":"10.1016/j.micpro.2024.105088","DOIUrl":"10.1016/j.micpro.2024.105088","url":null,"abstract":"<div><p>The present authentication solutions employed in the Internet of Things (IoT) are either inadequate or computationally intensive, given the resource-constrained nature of IoT devices. This challenges the researchers to devise efficient solutions to embed an important security tenet like <em>authentication</em>. In IoT, the most popular machine-to-machine communication protocol used at the application layer is <em>Message Queuing Telemetry Transport (MQTT)</em>. However, the MQTT protocol inherently lacks security-related functions, like <em>authentication, authorization, confidentiality, access control,</em> and <em>data integrity</em>, which is unacceptable for IoT-driven mission-critical applications when connected over public networks. In such a situation, the security is hardened by employing a transport layer security protocol like TLS, which entails significant computational overheads. This paper presents a novel scheme to enhance MQTT security by providing a lightweight multi-factor authentication scheme based on Elliptical curve cryptography. The proposed scheme uses a low-cost signature and a fuzzy extractor to correct errors in imprinted biometrics in noisy environments. This scheme attains mutual authentication, generates a securely agreed-upon session key for secret communication, and guarantees perfect forward secrecy. Furthermore, the rigorous informal security analysis shows the proposed scheme resists cryptographic attacks, including known session critical attacks. Furthermore, an empirical study has been carried out to assess the effectiveness of the proposed scheme in the Cooja simulated environment.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105088"},"PeriodicalIF":1.9,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142163118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Test generation algorithm for QCA circuits targeting novel defects and its corresponding fault models 针对新型缺陷的 QCA 电路测试生成算法及其相应的故障模型
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-30 DOI: 10.1016/j.micpro.2024.105090
Vaishali Dhare, Usha Mehta

Considering the scaling limitations of current Complementary Metal Oxide Semiconductor (CMOS) technology, Quantum-dot-Cellular Automata (QCA) is emerging as one of the alternatives. QCA being at the molecular scale, defects are more likely to occur in it. Therefore, substantial development of QCA-oriented defects, its corresponding fault models and test generation is required. In this paper, a test generation algorithm for a QCA combinational circuit is proposed. The FAN (A Fanout Oriented) test generation algorithm is extended for QCA. The proposed Automatic Test Pattern Generator (ATPG) for QCA targets Single Stuck at Fault (SSF) set produced by novel Multiple Missing Cells (MMC) defects. The proposed ATPG is based on the QCA-oriented test generation properties and guided by proposed testability measures.

The MCNC benchmark circuits are synthesized into QCA using proposed synthesis algorithms to check the effectiveness of the proposed ATPG. The ATPG is developed using C++ and tested on MCNC benchmark circuits. Further, ATPG-generated test vectors are validated at the QCA device level to demonstrate their correctness. The QCADesigner-E tool is used for the device-level implementation of the MCNC benchmark circuit.

考虑到当前互补金属氧化物半导体(CMOS)技术的扩展限制,量子点蜂窝自动机(QCA)正成为替代技术之一。QCA 处于分子尺度,更容易出现缺陷。因此,需要大量开发面向 QCA 的缺陷、相应的故障模型和测试生成。本文提出了一种 QCA 组合电路的测试生成算法。针对 QCA 扩展了 FAN(面向扇出)测试生成算法。所提出的 QCA 自动测试模式生成器(ATPG)针对的是由新型多缺失单元(MMC)缺陷产生的单故障(SSF)集。提议的 ATPG 基于面向 QCA 的测试生成特性,并以提议的可测试性措施为指导。使用提议的合成算法将 MCNC 基准电路合成为 QCA,以检查提议的 ATPG 的有效性。ATPG 使用 C++ 开发,并在 MCNC 基准电路上进行了测试。此外,还在 QCA 器件级验证了 ATPG 生成的测试向量,以证明其正确性。QCADesigner-E 工具用于 MCNC 基准电路的器件级实现。
{"title":"Test generation algorithm for QCA circuits targeting novel defects and its corresponding fault models","authors":"Vaishali Dhare,&nbsp;Usha Mehta","doi":"10.1016/j.micpro.2024.105090","DOIUrl":"10.1016/j.micpro.2024.105090","url":null,"abstract":"<div><p>Considering the scaling limitations of current Complementary Metal Oxide Semiconductor (CMOS) technology, Quantum-dot-Cellular Automata (QCA) is emerging as one of the alternatives. QCA being at the molecular scale, defects are more likely to occur in it. Therefore, substantial development of QCA-oriented defects, its corresponding fault models and test generation is required. In this paper, a test generation algorithm for a QCA combinational circuit is proposed. The FAN (A Fanout Oriented) test generation algorithm is extended for QCA. The proposed Automatic Test Pattern Generator (ATPG) for QCA targets Single Stuck at Fault (SSF) set produced by novel Multiple Missing Cells (MMC) defects. The proposed ATPG is based on the QCA-oriented test generation properties and guided by proposed testability measures.</p><p>The MCNC benchmark circuits are synthesized into QCA using proposed synthesis algorithms to check the effectiveness of the proposed ATPG. The ATPG is developed using C++ and tested on MCNC benchmark circuits. Further, ATPG-generated test vectors are validated at the QCA device level to demonstrate their correctness. The QCADesigner-E tool is used for the device-level implementation of the MCNC benchmark circuit.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"110 ","pages":"Article 105090"},"PeriodicalIF":1.9,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142122587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimized k-Nearest neighbors search implementation on resource-constrained FPGA platforms 在资源受限的 FPGA 平台上实现优化的 k 近邻搜索
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-10 DOI: 10.1016/j.micpro.2024.105089
Sandra Djosic, Milica Jovanovic, Goran Lj. Djordjevic

The k-Nearest Neighbors (kNN) algorithm is a fundamental machine learning classification technique with wide-ranging applications. Among various kNN implementation choices, FPGA-based heterogeneous systems have gained popularity due to FPGA's inherent parallelism, energy efficiency, and reconfigurability. However, implementing the kNN algorithm on resource-constrained embedded FPGA platforms, typically characterized by constrained programmable resources shared among various application-specific hardware units, necessitates a kNN accelerator architecture that balances high performance, hardware efficiency, and flexibility. To address this challenge, in this paper, we present a kNN hardware accelerator unit designed to optimize resource utilization by utilizing sequential, i.e. accumulation-based, instead of pipelined/parallel distance computations. The proposed architecture incorporates two key algorithmic optimizations to reduce the iteration count of the sequential distance computation loop: a dynamic lower bound enabling early termination of the distance computation and an online element selection that maximizes partial distance growth per iteration. We further enhance the accelerator's performance by incorporating multiple optimized sequential distance computation units, each dedicated to processing a segment of the training dataset. Our experiments demonstrate that the proposed approach is scalable, making it applicable to various hardware platforms and resource constraints. In particular, when implemented on an AMD Zynq device, the proposed single-core kNN accelerator occupies a mere 5 % of the FPGA's resources while delivering a speedup of 3 – 5 times compared to the kNN software implementation running on the accompanying ARM A9 processor. For the 8-core kNN accelerator, the resource utilization stands at 30 %, while the speedup factor ranges between 25 and 35.

k-Nearest Neighbors(kNN)算法是一种基本的机器学习分类技术,应用广泛。在各种 kNN 实现选择中,基于 FPGA 的异构系统因其固有的并行性、能效和可重构性而广受欢迎。然而,在资源受限的嵌入式 FPGA 平台上实施 kNN 算法,通常需要在各种特定应用硬件单元之间共享受限的可编程资源,这就需要一种兼顾高性能、硬件效率和灵活性的 kNN 加速器架构。为了应对这一挑战,我们在本文中提出了一种 kNN 硬件加速器单元,旨在通过利用顺序计算(即基于累加的计算)而不是流水线/并行距离计算来优化资源利用率。所提出的架构包含两个关键的算法优化,以减少顺序距离计算循环的迭代次数:一个是动态下限,使距离计算提前终止;另一个是在线元素选择,使每次迭代的部分距离增长最大化。通过整合多个优化的顺序距离计算单元,我们进一步提高了加速器的性能,每个单元专门用于处理训练数据集的一个片段。我们的实验证明,所提出的方法具有可扩展性,使其适用于各种硬件平台和资源限制。特别是,在 AMD Zynq 设备上实施时,建议的单核 kNN 加速器仅占用 FPGA 资源的 5%,而与在配套 ARM A9 处理器上运行的 kNN 软件实施相比,速度提高了 3 - 5 倍。对于 8 核 kNN 加速器,资源利用率为 30%,而速度提升系数在 25 到 35 之间。
{"title":"Optimized k-Nearest neighbors search implementation on resource-constrained FPGA platforms","authors":"Sandra Djosic,&nbsp;Milica Jovanovic,&nbsp;Goran Lj. Djordjevic","doi":"10.1016/j.micpro.2024.105089","DOIUrl":"10.1016/j.micpro.2024.105089","url":null,"abstract":"<div><p>The k-Nearest Neighbors (kNN) algorithm is a fundamental machine learning classification technique with wide-ranging applications. Among various kNN implementation choices, FPGA-based heterogeneous systems have gained popularity due to FPGA's inherent parallelism, energy efficiency, and reconfigurability. However, implementing the kNN algorithm on resource-constrained embedded FPGA platforms, typically characterized by constrained programmable resources shared among various application-specific hardware units, necessitates a kNN accelerator architecture that balances high performance, hardware efficiency, and flexibility. To address this challenge, in this paper, we present a kNN hardware accelerator unit designed to optimize resource utilization by utilizing sequential, i.e. accumulation-based, instead of pipelined/parallel distance computations. The proposed architecture incorporates two key algorithmic optimizations to reduce the iteration count of the sequential distance computation loop: a dynamic lower bound enabling early termination of the distance computation and an online element selection that maximizes partial distance growth per iteration. We further enhance the accelerator's performance by incorporating multiple optimized sequential distance computation units, each dedicated to processing a segment of the training dataset. Our experiments demonstrate that the proposed approach is scalable, making it applicable to various hardware platforms and resource constraints. In particular, when implemented on an AMD Zynq device, the proposed single-core kNN accelerator occupies a mere 5 % of the FPGA's resources while delivering a speedup of 3 – 5 times compared to the kNN software implementation running on the accompanying ARM A9 processor. For the 8-core kNN accelerator, the resource utilization stands at 30 <span><math><mo>%</mo></math></span>, while the speedup factor ranges between 25 and 35.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105089"},"PeriodicalIF":1.9,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141990505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mixture-of-Rookies: Saving DNN computations by predicting ReLU outputs Mixture-of-Rookies:通过预测 ReLU 输出节省 DNN 计算量
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-30 DOI: 10.1016/j.micpro.2024.105087
Dennis Pinto, Jose-María Arnau, Marc Riera, Josep-Llorenç Cruz, Antonio González

Deep Neural Networks (DNNs) are widely used in many application domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our predictor, named Mixture-of-Rookies, combines two inexpensive components. The first one exploits the high linear correlation between binarized (1-bit) and full-precision (8-bit) dot products, whereas the second component clusters together neurons that tend to output zero at the same time. We propose a novel clustering scheme based on analysis of angles, as the sign of the dot product of two vectors depends on the cosine of the angle between them. We implement our hybrid zero output predictor on top of a state-of-the-art DNN accelerator. Experimental results show that our scheme introduces a small area overhead of 5.3% while achieving a speedup of 1.2x and reducing energy consumption by 16.5% on average for a set of diverse DNNs.

深度神经网络(DNN)被广泛应用于许多应用领域。然而,它们需要大量的计算和内存访问才能提供出色的准确性。在本文中,我们提出了一种方案,用于预测每个 ReLu 激活神经元的输出是零还是正数,从而跳过那些可能输出零的神经元的计算。我们的预测器被命名为,结合了两个廉价的组件。第一个部分利用了二值化(1 位)和全精度(8 位)点积之间的高度线性相关性,而第二个部分则将倾向于同时输出零的神经元聚类在一起。我们提出了一种基于角度分析的新型聚类方案,因为两个向量点积的符号取决于它们之间角度的余弦值。我们在最先进的 DNN 加速器上实现了混合零输出预测器。实验结果表明,对于一组不同的 DNN,我们的方案引入了 5.3% 的小面积开销,同时实现了 1.2 倍的速度提升,并将能耗平均降低了 16.5%。
{"title":"Mixture-of-Rookies: Saving DNN computations by predicting ReLU outputs","authors":"Dennis Pinto,&nbsp;Jose-María Arnau,&nbsp;Marc Riera,&nbsp;Josep-Llorenç Cruz,&nbsp;Antonio González","doi":"10.1016/j.micpro.2024.105087","DOIUrl":"10.1016/j.micpro.2024.105087","url":null,"abstract":"<div><p>Deep Neural Networks (DNNs) are widely used in many application domains. However, they require a vast amount of computations and memory accesses to deliver outstanding accuracy. In this paper, we propose a scheme to predict whether the output of each ReLu activated neuron will be a zero or a positive number in order to skip the computation of those neurons that will likely output a zero. Our predictor, named <em>Mixture-of-Rookies</em>, combines two inexpensive components. The first one exploits the high linear correlation between binarized (1-bit) and full-precision (8-bit) dot products, whereas the second component clusters together neurons that tend to output zero at the same time. We propose a novel clustering scheme based on analysis of angles, as the sign of the dot product of two vectors depends on the cosine of the angle between them. We implement our hybrid zero output predictor on top of a state-of-the-art DNN accelerator. Experimental results show that our scheme introduces a small area overhead of 5.3% while achieving a speedup of 1.2x and reducing energy consumption by 16.5% on average for a set of diverse DNNs.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105087"},"PeriodicalIF":1.9,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000826/pdfft?md5=f3e30ee4d950e1c93554e32d04ba1b80&pid=1-s2.0-S0141933124000826-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PyIgH : A unified architecture of IgH EtherCAT Master based on Python considering hard real-time constraints PyIgH:基于 Python 的 IgH EtherCAT 主站统一架构,考虑硬实时约束条件
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-19 DOI: 10.1016/j.micpro.2024.105085
Raimarius Delgado , Se Yeon Cho , Byoung Wook Choi

The increasing demand for rapid application development tools, especially those employing high-level languages such as Python, has underscored the importance of utilizing a wide array of popular libraries while addressing real-time constraints in distributed hardware systems. This paper introduces PyIgH, a unified architecture of an IgH EtherCAT master based on Python, specifically designed to satisfy hard real-time requirements in an EtherCAT network. Implemented as a Python module, PyIgH exposes the functionalities and capabilities of an open-source EtherCAT master, facilitating seamless configuration and control of EtherCAT slave devices within the Python runtime environment. Real-time adaptation of the POSIX library, encapsulated within Python, is also utilized to satisfy the timing requirements of EtherCAT. The feasibility of the proposed approach is verified by analyzing the real-time performance in terms of periodicity and in-controller delay of the EtherCAT control task with a 1 kHz cycle. Experimental results demonstrate that PyIgH is suitable for hard real-time applications and serves as a valid alternative to conventional low-level EtherCAT masters. Additionally, a practical application involving motion control of a six-axis collaborative robot showcases consistent performance of PyIgH within a real-time multi-tasking environment.

对快速应用开发工具的需求日益增长,尤其是那些采用 Python 等高级语言的工具,这凸显了在解决分布式硬件系统中的实时性限制的同时利用各种流行库的重要性。本文介绍的 PyIgH 是一种基于 Python 的 IgH EtherCAT 主站统一架构,专门用于满足 EtherCAT 网络中的硬实时性要求。PyIgH 以 Python 模块的形式实现,公开了开源 EtherCAT 主站的功能和能力,便于在 Python 运行环境中对 EtherCAT 从站设备进行无缝配置和控制。此外,还利用封装在 Python 中的 POSIX 库进行实时调整,以满足 EtherCAT 的定时要求。通过分析周期为 1 kHz 的 EtherCAT 控制任务在周期性和控制器内延迟方面的实时性能,验证了所提方法的可行性。实验结果表明,PyIgH 适用于硬实时应用,是传统低级 EtherCAT 主站的有效替代方案。此外,一个涉及六轴协作机器人运动控制的实际应用展示了 PyIgH 在实时多任务环境中的稳定性能。
{"title":"PyIgH : A unified architecture of IgH EtherCAT Master based on Python considering hard real-time constraints","authors":"Raimarius Delgado ,&nbsp;Se Yeon Cho ,&nbsp;Byoung Wook Choi","doi":"10.1016/j.micpro.2024.105085","DOIUrl":"10.1016/j.micpro.2024.105085","url":null,"abstract":"<div><p>The increasing demand for rapid application development tools, especially those employing high-level languages such as Python, has underscored the importance of utilizing a wide array of popular libraries while addressing real-time constraints in distributed hardware systems. This paper introduces PyIgH, a unified architecture of an IgH EtherCAT master based on Python, specifically designed to satisfy hard real-time requirements in an EtherCAT network. Implemented as a Python module, PyIgH exposes the functionalities and capabilities of an open-source EtherCAT master, facilitating seamless configuration and control of EtherCAT slave devices within the Python runtime environment. Real-time adaptation of the POSIX library, encapsulated within Python, is also utilized to satisfy the timing requirements of EtherCAT. The feasibility of the proposed approach is verified by analyzing the real-time performance in terms of periodicity and in-controller delay of the EtherCAT control task with a 1 kHz cycle. Experimental results demonstrate that PyIgH is suitable for hard real-time applications and serves as a valid alternative to conventional low-level EtherCAT masters. Additionally, a practical application involving motion control of a six-axis collaborative robot showcases consistent performance of PyIgH within a real-time multi-tasking environment.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105085"},"PeriodicalIF":1.9,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141840164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Full wireless goniometer design with activity recognition for upper and lower limb 全无线动态关节角度计设计,具有上下肢活动识别功能
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-17 DOI: 10.1016/j.micpro.2024.105086
Cemil Keskinoğlu , Ahmet Aydın

People must move using their lower and upper extremities to complete their work. Depending on these extremities' using frequency or different effects such as age, genetics, and body weight, the extremities' ability may decrease. The joints' range of motion(ROM) is measured to evaluate this decrease. Different systems, such as conventional goniometers, mobile phone applications, and sensor-based systems, can measure the ROM value. Still, it can be challenging to measure this parameter in different situations, such as training, moving activities, etc. The partial wireless goniometer and a companion 3D visualization and control GUI were developed in our previous study. However, it was difficult to mount it on the limbs at a distance, or it was impossible to use it for both legs to measure the hip angles. Therefore, this study presents a full wireless goniometer system that can simultaneously measure in real-time and show joint movements in a 3D model for the upper and lower extremities. The angle values required for the ROM were measured with two IMU sensors. Two ESP32s were used as microcontrollers in the system, and a fully wireless system was enabled by transferring data via ESP-NOW and Bluetooth. Thanks to ESP-NOW, the system has less latency compared to other protocols and can transmit data over longer distances. The developed system can also perform activity recognition which is not available in other goniometers. The measurements of the system were compared with a conventional goniometer, and their results were found to be completely correlated (ρc=1).

人们必须通过上下肢的运动来完成工作。根据这些肢体的使用频率或不同的影响(如年龄、遗传和体重),肢体的能力可能会下降。测量关节的活动范围(ROM)就是为了评估这种下降。不同的系统,如传统的动态关节角度计、手机应用程序和基于传感器的系统,都可以测量 ROM 值。不过,在训练、活动等不同情况下测量这一参数仍具有挑战性。我们在之前的研究中开发了部分无线动态关节角度计和配套的三维可视化控制图形用户界面。然而,将其安装在远距离的肢体上存在困难,或者无法用于双腿测量髋关节角度。因此,本研究提出了一种全无线动态关节角度计系统,可同时实时测量上下肢的关节运动并在三维模型中显示。ROM 所需的角度值由两个 IMU 传感器测量。系统中使用了两个 ESP32 作为微控制器,通过 ESP-NOW 和蓝牙传输数据,实现了全无线系统。与其他协议相比,ESP-NOW 使系统的延迟时间更短,数据传输距离更远。开发的系统还能进行活动识别,这是其他动态关节角度计所不具备的。该系统的测量结果与传统的动态关节角度计进行了比较,发现两者的测量结果完全相关。
{"title":"Full wireless goniometer design with activity recognition for upper and lower limb","authors":"Cemil Keskinoğlu ,&nbsp;Ahmet Aydın","doi":"10.1016/j.micpro.2024.105086","DOIUrl":"10.1016/j.micpro.2024.105086","url":null,"abstract":"<div><p>People must move using their lower and upper extremities to complete their work. Depending on these extremities' using frequency or different effects such as age, genetics, and body weight, the extremities' ability may decrease. The joints' range of motion(ROM) is measured to evaluate this decrease. Different systems, such as conventional goniometers, mobile phone applications, and sensor-based systems, can measure the ROM value. Still, it can be challenging to measure this parameter in different situations, such as training, moving activities, etc. The partial wireless goniometer and a companion 3D visualization and control GUI were developed in our previous study. However, it was difficult to mount it on the limbs at a distance, or it was impossible to use it for both legs to measure the hip angles. Therefore, this study presents a full wireless goniometer system that can simultaneously measure in real-time and show joint movements in a 3D model for the upper and lower extremities. The angle values required for the ROM were measured with two IMU sensors. Two ESP32s were used as microcontrollers in the system, and a fully wireless system was enabled by transferring data via ESP-NOW and Bluetooth. Thanks to ESP-NOW, the system has less latency compared to other protocols and can transmit data over longer distances. The developed system can also perform activity recognition which is not available in other goniometers. The measurements of the system were compared with a conventional goniometer, and their results were found to be completely correlated <span><math><mrow><mo>(</mo><mrow><msub><mi>ρ</mi><mi>c</mi></msub><mo>=</mo><mn>1</mn></mrow><mo>)</mo></mrow></math></span>.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105086"},"PeriodicalIF":1.9,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141775424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Count overflow and privilege mode filtering extension implementation on a RISC-V on-board processor 在 RISC-V 板载处理器上实现计数溢出和特权模式过滤扩展
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-07-14 DOI: 10.1016/j.micpro.2024.105084
Andrea Fernández Gallego, Miguel Jiménez Arribas, Iván Gamino del Río, Agustín Martínez Hellín, Manuel Prieto Mateo, Óscar Rodríguez Polo, Antonio da Silva, Pablo Parra, Sebastián Sánchez

RISC-V is a computer architecture that has recently attracted considerable attention due to its advantageous qualities: it is an open instruction set, based on reduced and simple instructions. For this reason it has become an appealing choice for a wide range of computing applications and has positioned it as a disruptive force in a wide variety of fields, including those that involve the development of safety–critical software, as in the space sector. The ability to evaluate the activities performed within a processor is of paramount importance in this type of systems to ensure the fulfillment of the requirements during space missions. The monitoring of these events inside the processor is managed by an instrument called Hardware Performance Monitor (HPM). This work shows the implementation of the Sscofpmf extension of the HPM compliant to the RISC-V privileged specification. The paper details the redesign of the existing performance counters from a RISC-V baseline version previously implemented. A comparison between the two versions of both resource utilization data and power consumption is also provided. As expected, the Sscofpmf extension version has a higher resource utilization. Nevertheless, the paper shows that the additional functionalities included in the system have been validated without any changes in the processor clock frequency, so the extension does not introduce any performance overhead.

RISC-V 是一种计算机体系结构,最近因其优点而备受关注:它是一种开放式指令集,以精简的简单指令为基础。因此,它已成为各种计算应用的理想选择,并在众多领域成为一股颠覆性的力量,包括那些涉及安全关键软件开发的领域,如太空领域。在这类系统中,评估处理器内部活动的能力至关重要,可确保满足太空任务的要求。对处理器内部这些活动的监控由一种名为硬件性能监控器(HPM)的仪器进行管理。这项工作展示了符合 RISC-V 特权规范的 HPM 的 Sscofpmf 扩展实现。论文详细介绍了在先前实施的 RISC-V 基准版本基础上对现有性能计数器的重新设计。论文还对两个版本的资源利用率数据和功耗进行了比较。不出所料,Sscofpmf 扩展版本的资源利用率更高。不过,本文表明,系统中包含的附加功能已通过验证,处理器时钟频率没有任何变化,因此扩展版本不会带来任何性能开销。
{"title":"Count overflow and privilege mode filtering extension implementation on a RISC-V on-board processor","authors":"Andrea Fernández Gallego,&nbsp;Miguel Jiménez Arribas,&nbsp;Iván Gamino del Río,&nbsp;Agustín Martínez Hellín,&nbsp;Manuel Prieto Mateo,&nbsp;Óscar Rodríguez Polo,&nbsp;Antonio da Silva,&nbsp;Pablo Parra,&nbsp;Sebastián Sánchez","doi":"10.1016/j.micpro.2024.105084","DOIUrl":"10.1016/j.micpro.2024.105084","url":null,"abstract":"<div><p><em>RISC-V</em> is a computer architecture that has recently attracted considerable attention due to its advantageous qualities: it is an open instruction set, based on reduced and simple instructions. For this reason it has become an appealing choice for a wide range of computing applications and has positioned it as a disruptive force in a wide variety of fields, including those that involve the development of safety–critical software, as in the space sector. The ability to evaluate the activities performed within a processor is of paramount importance in this type of systems to ensure the fulfillment of the requirements during space missions. The monitoring of these events inside the processor is managed by an instrument called Hardware Performance Monitor (HPM). This work shows the implementation of the <em>Sscofpmf</em> extension of the HPM compliant to the <em>RISC-V</em> privileged specification. The paper details the redesign of the existing performance counters from a <em>RISC-V</em> baseline version previously implemented. A comparison between the two versions of both resource utilization data and power consumption is also provided. As expected, the <em>Sscofpmf</em> extension version has a higher resource utilization. Nevertheless, the paper shows that the additional functionalities included in the system have been validated without any changes in the processor clock frequency, so the extension does not introduce any performance overhead.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"109 ","pages":"Article 105084"},"PeriodicalIF":1.9,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000796/pdfft?md5=db2cd71fd8fabeee87eb0b479d1b76cc&pid=1-s2.0-S0141933124000796-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141699792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Microprocessors and Microsystems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1