首页 > 最新文献

Microprocessors and Microsystems最新文献

英文 中文
Fault tolerant voting circuits: A Dual-Modular-Redundancy approach for Single-Event-Transient mitigation 容错投票电路:单事件暂态缓解的双模冗余方法
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-01 Epub Date: 2025-10-11 DOI: 10.1016/j.micpro.2025.105207
Marcello Barbirotta, Marco Angioli, Antonio Mastrandrea, Francesco Menichelli, Marco Pisani, Mauro Olivieri
As device dimensions shrink and operating frequencies increase in modern technologies, Single Event Transient faults present significant challenges. These arises from the susceptibility to radiation-induced errors and decreasing feature sizes, which can propagate through logic circuits and result in incorrect system behavior, reducing reliability, particularly concerning internal nodes of combinational voting circuits.
This paper emphasizes the importance of voting schemes focusing on specific Dual Modular Redundancy lock-step architectures where the voting system is made of a comparator with parity and a recovery signal. The study includes both theoretical and practical fault injection analyses and proposes a novel voting structure designed to reduce the failure rate to 0.4% in cases of Input-Internal faults. This achievement represents the lowest failure rate reported in the literature when compared to conventional DMR lock-step comparators and Self voter approaches without filtering mechanisms. The proposed solution significantly enhances fault resilience, with only a slight increase in hardware utilization and frequency performance.
在现代技术中,随着设备尺寸的缩小和工作频率的增加,单事件瞬态故障提出了重大挑战。这是由于对辐射引起的误差的敏感性和特征尺寸的减小,这可以通过逻辑电路传播,导致不正确的系统行为,降低可靠性,特别是关于组合投票电路的内部节点。本文强调了投票方案的重要性,重点讨论了特定的双模冗余锁步结构,其中投票系统由一个具有奇偶校验的比较器和一个恢复信号组成。该研究包括理论和实际故障注入分析,并提出了一种新的投票结构,旨在将输入-内部故障的故障率降低到0.4%。与传统的DMR锁步比较器和没有过滤机制的自我投票方法相比,这一成就代表了文献中报道的最低故障率。该方案显著提高了故障恢复能力,硬件利用率和频率性能仅略有提高。
{"title":"Fault tolerant voting circuits: A Dual-Modular-Redundancy approach for Single-Event-Transient mitigation","authors":"Marcello Barbirotta,&nbsp;Marco Angioli,&nbsp;Antonio Mastrandrea,&nbsp;Francesco Menichelli,&nbsp;Marco Pisani,&nbsp;Mauro Olivieri","doi":"10.1016/j.micpro.2025.105207","DOIUrl":"10.1016/j.micpro.2025.105207","url":null,"abstract":"<div><div>As device dimensions shrink and operating frequencies increase in modern technologies, Single Event Transient faults present significant challenges. These arises from the susceptibility to radiation-induced errors and decreasing feature sizes, which can propagate through logic circuits and result in incorrect system behavior, reducing reliability, particularly concerning internal nodes of combinational voting circuits.</div><div>This paper emphasizes the importance of voting schemes focusing on specific Dual Modular Redundancy lock-step architectures where the voting system is made of a comparator with parity and a recovery signal. The study includes both theoretical and practical fault injection analyses and proposes a novel voting structure designed to reduce the failure rate to 0.4% in cases of Input-Internal faults. This achievement represents the lowest failure rate reported in the literature when compared to conventional DMR lock-step comparators and Self voter approaches without filtering mechanisms. The proposed solution significantly enhances fault resilience, with only a slight increase in hardware utilization and frequency performance.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"119 ","pages":"Article 105207"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145324184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Power/accuracy-aware dynamic workload optimization combining application autotuning and runtime resource management on homogeneous architectures 功耗/精度感知动态工作负载优化,在同构架构上结合应用程序自动调优和运行时资源管理
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-12-01 Epub Date: 2025-10-20 DOI: 10.1016/j.micpro.2025.105219
Roberto Rocco, Francesco Gianchino, Antonio Miele, Gianluca Palermo
Nowadays, most computing systems experience highly dynamic workloads with performance-demanding applications entering and leaving the system with an unpredictable trend. Ensuring their performance guarantees led to the design of adaptive mechanisms, including (i) application autotuners, able to optimize algorithmic parameters (e.g., frame resolution in a video processing application), and (ii) runtime resource management to distribute computing resources among the running applications and tune architectural knobs (e.g., frequency scaling). Past work investigates the two directions separately, acting on a limited set of control knobs and objective functions; instead, this work proposes a combined framework to integrate these two complementary approaches in a single two-level governor acting on the overall hardware/software stack. The resource manager incorporates a policy for computing resource distribution and architectural knobs to guarantee the required performance of each application while limiting the side effect on results quality and minimizing system power consumption. Meanwhile, the autotuner manages the applications’ software knobs, ensuring results’ quality and performance constraint satisfaction while hiding application details from the controller. Experimental evaluation carried out on a homogeneous architecture for workstation machines demonstrates that the proposed framework is stable and can save more than 72% of the power consumed by one-layer solutions.
如今,大多数计算系统都经历了高度动态的工作负载,对性能要求很高的应用程序以不可预测的趋势进入和离开系统。确保它们的性能保证导致了自适应机制的设计,包括(i)应用程序自动调谐器,能够优化算法参数(例如,视频处理应用程序中的帧分辨率),以及(ii)运行时资源管理,以便在运行的应用程序之间分配计算资源并调整架构旋钮(例如,频率缩放)。过去的工作分别研究了这两个方向,作用于一组有限的控制旋钮和目标函数;相反,这项工作提出了一个组合框架,将这两种互补的方法集成到一个单独的两级调控器中,作用于整个硬件/软件堆栈。资源管理器合并了计算资源分配和架构旋钮的策略,以保证每个应用程序所需的性能,同时限制对结果质量的副作用,并将系统功耗降至最低。同时,自动调谐器管理应用程序的软件旋钮,确保结果的质量和性能约束的满足,同时对控制器隐藏应用程序的细节。在工作站机器的同构架构上进行的实验评估表明,所提出的框架是稳定的,并且可以节省单层解决方案消耗的72%以上的功耗。
{"title":"Power/accuracy-aware dynamic workload optimization combining application autotuning and runtime resource management on homogeneous architectures","authors":"Roberto Rocco,&nbsp;Francesco Gianchino,&nbsp;Antonio Miele,&nbsp;Gianluca Palermo","doi":"10.1016/j.micpro.2025.105219","DOIUrl":"10.1016/j.micpro.2025.105219","url":null,"abstract":"<div><div>Nowadays, most computing systems experience highly dynamic workloads with performance-demanding applications entering and leaving the system with an unpredictable trend. Ensuring their performance guarantees led to the design of adaptive mechanisms, including (i) application autotuners, able to optimize algorithmic parameters (e.g., frame resolution in a video processing application), and (ii) runtime resource management to distribute computing resources among the running applications and tune architectural knobs (e.g., frequency scaling). Past work investigates the two directions separately, acting on a limited set of control knobs and objective functions; instead, this work proposes a combined framework to integrate these two complementary approaches in a single two-level governor acting on the overall hardware/software stack. The resource manager incorporates a policy for computing resource distribution and architectural knobs to guarantee the required performance of each application while limiting the side effect on results quality and minimizing system power consumption. Meanwhile, the autotuner manages the applications’ software knobs, ensuring results’ quality and performance constraint satisfaction while hiding application details from the controller. Experimental evaluation carried out on a homogeneous architecture for workstation machines demonstrates that the proposed framework is stable and can save more than 72% of the power consumed by one-layer solutions.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"119 ","pages":"Article 105219"},"PeriodicalIF":2.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145365057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DSL-based SNN accelerator design using Chisel 基于dsl的SNN加速器的Chisel设计
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 Epub Date: 2025-09-03 DOI: 10.1016/j.micpro.2025.105187
Patrick Plagwitz , Frank Hannig , Jürgen Teich , Oliver Keszocze
Neural Networks (NNs) are a very active field of research that also has wide-ranging applications in industry. An emerging type of NN that is promising for hardware acceleration and low energy requirements are Spiking Neural Networks (SNNs). But design automation in terms of accelerator circuit generation is still lacking proper search techniques for optimization of network parameters including the selection of proper neuron models and spike encodings. They are often restricted to implement a single network setting and/or a fixed hardware architecture.
In this paper, we present a novel multi-layer Domain-Specific Language (DSL) for constructing sequential circuits, including building blocks for pipelines supporting hazard detection. As the host language, we use Chisel, a hardware construction language allowing to express hardware at Register-Transfer Level and above. In contrast to applying High-Level Synthesis, we introduce a domain-specific language (DSL) for SNN accelerator design based on Chisel by defining building blocks for SNNs. After introducing this DSL, we present a full SNN accelerator generation framework that covers all phases, from training to deployment. Also proposed is a design space exploration for various SNN accelerator designs using different neuron models, their parametrizations as well as spike encodings. The generated designs are evaluated in terms of execution time, power consumption, classification accuracy, and resource usage when mapped to Field-Programmable Gate Arrays (FPGAs) for the MNIST, Fashion-MNIST, SVHN, and CIFAR-10 data sets.
神经网络是一个非常活跃的研究领域,在工业上也有广泛的应用。尖峰神经网络(snn)是一种新兴的神经网络,有望实现硬件加速和低能量需求。但是在加速器电路生成方面的设计自动化仍然缺乏合适的搜索技术来优化网络参数,包括选择合适的神经元模型和尖峰编码。它们通常仅限于实现单一的网络设置和/或固定的硬件体系结构。在本文中,我们提出了一种新的多层领域特定语言(DSL),用于构建顺序电路,包括支持危险检测的管道构建块。我们使用Chisel作为宿主语言,这是一种硬件构造语言,允许在寄存器-传输级及以上级别表示硬件。与应用高级合成相比,我们通过定义SNN的构建块,为基于Chisel的SNN加速器设计引入了一种领域特定语言(DSL)。在介绍了这个DSL之后,我们提出了一个完整的SNN加速器生成框架,涵盖了从培训到部署的所有阶段。还提出了使用不同神经元模型、参数化和尖峰编码的各种SNN加速器设计的设计空间探索。当将生成的设计映射到用于MNIST、Fashion-MNIST、SVHN和CIFAR-10数据集的现场可编程门阵列(fpga)时,将根据执行时间、功耗、分类准确性和资源使用情况对其进行评估。
{"title":"DSL-based SNN accelerator design using Chisel","authors":"Patrick Plagwitz ,&nbsp;Frank Hannig ,&nbsp;Jürgen Teich ,&nbsp;Oliver Keszocze","doi":"10.1016/j.micpro.2025.105187","DOIUrl":"10.1016/j.micpro.2025.105187","url":null,"abstract":"<div><div>Neural Networks (NNs) are a very active field of research that also has wide-ranging applications in industry. An emerging type of NN that is promising for hardware acceleration and low energy requirements are Spiking Neural Networks (SNNs). But design automation in terms of accelerator circuit generation is still lacking proper search techniques for optimization of network parameters including the selection of proper neuron models and spike encodings. They are often restricted to implement a single network setting and/or a fixed hardware architecture.</div><div>In this paper, we present a novel multi-layer Domain-Specific Language (DSL) for constructing sequential circuits, including building blocks for pipelines supporting hazard detection. As the host language, we use Chisel, a hardware construction language allowing to express hardware at Register-Transfer Level and above. In contrast to applying High-Level Synthesis, we introduce a domain-specific language (DSL) for SNN accelerator design based on Chisel by defining building blocks for SNNs. After introducing this DSL, we present a full SNN accelerator generation framework that covers all phases, from training to deployment. Also proposed is a design space exploration for various SNN accelerator designs using different neuron models, their parametrizations as well as spike encodings. The generated designs are evaluated in terms of execution time, power consumption, classification accuracy, and resource usage when mapped to Field-Programmable Gate Arrays (FPGAs) for the MNIST, Fashion-MNIST, SVHN, and CIFAR-10 data sets.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105187"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extended design and linearity analysis of a 6-bit low-area hybrid ADC design for local system-on-chip measurements 用于本地片上系统测量的6位低面积混合ADC设计的扩展设计和线性分析
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 Epub Date: 2025-08-26 DOI: 10.1016/j.micpro.2025.105191
Nima Kolahimahmoudi, Giorgio Insinga, Paolo Bernardi
The low observability of analog signals inside modern low-area system-on-chips (SoCs) results in an increasing need for Design for Testability (DfT) solutions. These solutions demand an optimal circuit design in terms of area, power consumption, and precision, with a focus on minimizing area overhead per SoC circuit blocks. To address this demand, we present a 6-bit, low-area Hybrid Analog-to-Digital Converter (ADC) that measures analog voltage inside SoCs locally. The proposed Hybrid ADC consists of two sub-ADCs: A 3-bit SAR ADC for coarse measurements and a 3-bit Flash ADC for fine measurements.
The advantage of the proposed ADC design is its low additional area cost to each IP of SoCs due to its specific design. It can also have a shared fine Flash part, which has the dominant area in the design. This ADC design converts the analog signals, which are difficult to read from SoC pins, to the digital domain, where they are easy to route and observe.
The suggested ADC is designed and analyzed using the 130 nm technology of Infineon, and it has a total area of 0.007 mm2. The areas of the fine Flash and coarse SAR parts are 0.0015 mm2 and 0.0042 mm2 respectively. The Signal-to-Noise Distortion Ratio (SNDR) of the design is 37 dB, and the Figure of Merit (FoM) is 2.15 pJ/conv.
现代低面积片上系统(soc)中模拟信号的低可观测性导致对可测试性设计(DfT)解决方案的需求日益增加。这些解决方案需要在面积、功耗和精度方面进行优化电路设计,重点是尽量减少每个SoC电路块的面积开销。为了满足这一需求,我们提出了一种6位,低面积混合模数转换器(ADC),可在本地测量soc内部的模拟电压。提出的混合ADC由两个子ADC组成:用于粗测量的3位SAR ADC和用于精细测量的3位Flash ADC。所提出的ADC设计的优势在于,由于其特殊的设计,每个soc IP的额外面积成本很低。它也可以有一个共享的精美的Flash部分,在设计中占主导地位。该ADC设计将难以从SoC引脚读取的模拟信号转换为易于路由和观察的数字域。所建议的ADC采用英飞凌的130纳米技术进行设计和分析,其总面积为0.007 mm2。精细部分的面积为0.0015 mm2,粗糙部分的面积为0.0042 mm2。该设计的信噪比(SNDR)为37 dB,性能因数(FoM)为2.15 pJ/conv。
{"title":"Extended design and linearity analysis of a 6-bit low-area hybrid ADC design for local system-on-chip measurements","authors":"Nima Kolahimahmoudi,&nbsp;Giorgio Insinga,&nbsp;Paolo Bernardi","doi":"10.1016/j.micpro.2025.105191","DOIUrl":"10.1016/j.micpro.2025.105191","url":null,"abstract":"<div><div>The low observability of analog signals inside modern low-area system-on-chips (SoCs) results in an increasing need for Design for Testability (DfT) solutions. These solutions demand an optimal circuit design in terms of area, power consumption, and precision, with a focus on minimizing area overhead per SoC circuit blocks. To address this demand, we present a 6-bit, low-area Hybrid Analog-to-Digital Converter (ADC) that measures analog voltage inside SoCs locally. The proposed Hybrid ADC consists of two sub-ADCs: A 3-bit SAR ADC for coarse measurements and a 3-bit Flash ADC for fine measurements.</div><div>The advantage of the proposed ADC design is its low additional area cost to each IP of SoCs due to its specific design. It can also have a shared fine Flash part, which has the dominant area in the design. This ADC design converts the analog signals, which are difficult to read from SoC pins, to the digital domain, where they are easy to route and observe.</div><div>The suggested ADC is designed and analyzed using the 130<!--> <!-->nm technology of Infineon, and it has a total area of 0.007<!--> <!-->mm<sup>2</sup>. The areas of the fine Flash and coarse SAR parts are 0.0015<!--> <!-->mm<sup>2</sup> and 0.0042<!--> <!-->mm<sup>2</sup> respectively. The Signal-to-Noise Distortion Ratio (SNDR) of the design is 37<!--> <!-->dB, and the Figure of Merit (FoM) is 2.15<!--> <!-->pJ/conv.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105191"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144921812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IHKEM: A post-quantum ready hierarchical key establishment and management scheme for wireless sensor networks IHKEM:一种后量子就绪的无线传感器网络分层密钥建立与管理方案
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 Epub Date: 2025-09-25 DOI: 10.1016/j.micpro.2025.105205
Khushboo Jain , Akansha Singh
Wireless Sensor Networks (WSNs) are increasingly embedded in mission-critical infrastructures, yet their constrained resources make conventional cryptographic solutions unsuitable. Existing hierarchical key management schemes, such as the RB method, provide partial protection but remain vulnerable to impersonation, replay, and node capture attacks. To address these challenges, we propose IHKEM (Improved Hierarchical Key Establishment and Management), a lightweight yet robust protocol that integrates symmetric and asymmetric primitives for mutual authentication, dynamic session key establishment, and end-to-end confidentiality. Unlike static key distribution methods, IHKEM eliminates unilateral key control, employs nonce- and timestamp-based validation for replay resistance, and supports adaptive key refreshing to preserve forward and backward secrecy. Extensive NS-2.35 simulations demonstrate that IHKEM significantly reduces energy consumption (∼15–20% over RB), improves flexibility against node compromise (>80% uncompromised links under 15% capture), extends network lifetime (delayed FND/HND thresholds), lowers memory footprint (∼20–25% reduction), while incurring only ∼3% higher overhead compared to lightweight schemes such as SEE2PK. Beyond its immediate gains, IHKEM’s modular architecture ensures post-quantum readiness, enabling seamless integration of lattice-based key encapsulation and signature schemes. This work bridges the gap between efficiency, resilience, and long-term cryptographic sustainability in WSNs.
无线传感器网络(wsn)越来越多地嵌入到关键任务基础设施中,但其有限的资源使得传统的加密解决方案不适合。现有的分层密钥管理方案(如RB方法)提供了部分保护,但仍然容易受到模拟、重放和节点捕获攻击。为了解决这些挑战,我们提出了IHKEM(改进的分层密钥建立和管理),这是一种轻量级但健壮的协议,它集成了对称和非对称原语,用于相互认证,动态会话密钥建立和端到端机密性。与静态密钥分发方法不同,IHKEM消除了单边密钥控制,采用基于nonce和时间戳的验证来抵抗重放,并支持自适应密钥刷新以保持向前和向后的保密性。广泛的NS-2.35模拟表明,IHKEM显著降低了能耗(比RB降低了15-20%),提高了针对节点妥协的灵活性(>;80%未妥协的链路在15%捕获下),延长了网络寿命(延迟FND/HND阈值),降低了内存占用(降低了20-25%),而与轻量级方案(如SEE2PK)相比,开销仅增加了约3%。除了它的直接收益,IHKEM的模块化架构确保了后量子准备,实现基于晶格的密钥封装和签名方案的无缝集成。这项工作弥合了无线传感器网络中效率、弹性和长期加密可持续性之间的差距。
{"title":"IHKEM: A post-quantum ready hierarchical key establishment and management scheme for wireless sensor networks","authors":"Khushboo Jain ,&nbsp;Akansha Singh","doi":"10.1016/j.micpro.2025.105205","DOIUrl":"10.1016/j.micpro.2025.105205","url":null,"abstract":"<div><div>Wireless Sensor Networks (WSNs) are increasingly embedded in mission-critical infrastructures, yet their constrained resources make conventional cryptographic solutions unsuitable. Existing hierarchical key management schemes, such as the RB method, provide partial protection but remain vulnerable to impersonation, replay, and node capture attacks. To address these challenges, we propose IHKEM (Improved Hierarchical Key Establishment and Management), a lightweight yet robust protocol that integrates symmetric and asymmetric primitives for mutual authentication, dynamic session key establishment, and end-to-end confidentiality. Unlike static key distribution methods, IHKEM eliminates unilateral key control, employs nonce- and timestamp-based validation for replay resistance, and supports adaptive key refreshing to preserve forward and backward secrecy. Extensive NS-2.35 simulations demonstrate that IHKEM significantly reduces energy consumption (∼15–20% over RB), improves flexibility against node compromise (&gt;80% uncompromised links under 15% capture), extends network lifetime (delayed FND/HND thresholds), lowers memory footprint (∼20–25% reduction), while incurring only ∼3% higher overhead compared to lightweight schemes such as SEE2PK. Beyond its immediate gains, IHKEM’s modular architecture ensures post-quantum readiness, enabling seamless integration of lattice-based key encapsulation and signature schemes. This work bridges the gap between efficiency, resilience, and long-term cryptographic sustainability in WSNs.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105205"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and implementation of a hardware accelerator IP core for improved lightweight deep learning model 改进轻量级深度学习模型的硬件加速器IP核的设计与实现
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 Epub Date: 2025-09-22 DOI: 10.1016/j.micpro.2025.105202
Wei Zeng, Yuzhou Xiao, Yiru Wang, Caihua Chen, Sulan He
Real-time multi-point, full-scene monitoring with low cost, low power consumption, low communication overhead, and front-end deployment is a current research focus in fire detection technology. This paper investigates and implements fire detection technology on the low-computation ZYNQ platform based on deep learning, aiming to provide a cost-effective, highly efficient, and reliable fire detection solution. Firstly, we propose a lightweight network model, YOLO-Fire, which incorporates modifications like replacing standard convolutions with depthwise separable convolutions, adding the ECA attention mechanism, and introducing multi-scale feature fusion to suit the memory and computational limitations of the ZYNQ device. Additionally, we designed a hardware accelerator IP core for the ZYNQ7020 platform using a specific loop tiling strategy, constraint statements, and a dual-dimensional parallel optimization of convolution input and output channels. Combined with fixed-point quantization and resource optimization, this implementation achieves efficient acceleration of convolution, pooling, and upsampling layers. Experimental results show that YOLO-Fire improves accuracy, recall, and F1-score on the BoWFire public flame dataset and a self-constructed flame dataset. Additionally, the average inference time on the ZYNQ platform is approximately 74.43 times faster than on mainstream ARM AI platforms, verifying the effectiveness of the proposed acceleration approach.
低成本、低功耗、低通信开销、前端部署的实时多点、全场景监控是当前火灾探测技术的研究热点。本文研究并实现了基于深度学习的低计算ZYNQ平台上的火灾探测技术,旨在提供一种经济、高效、可靠的火灾探测解决方案。首先,我们提出了一个轻量级的网络模型YOLO-Fire,该模型包含了一些修改,如用深度可分离卷积代替标准卷积,增加ECA注意机制,引入多尺度特征融合以适应ZYNQ设备的内存和计算限制。此外,我们为ZYNQ7020平台设计了一个硬件加速器IP核,使用特定的循环平铺策略、约束语句和卷积输入和输出通道的二维并行优化。结合定点量化和资源优化,实现了卷积层、池化层和上采样层的高效加速。实验结果表明,YOLO-Fire在BoWFire公共火焰数据集和自构建火焰数据集上提高了准确率、召回率和f1分数。此外,ZYNQ平台上的平均推理时间比主流ARM AI平台快约74.43倍,验证了所提出的加速方法的有效性。
{"title":"Design and implementation of a hardware accelerator IP core for improved lightweight deep learning model","authors":"Wei Zeng,&nbsp;Yuzhou Xiao,&nbsp;Yiru Wang,&nbsp;Caihua Chen,&nbsp;Sulan He","doi":"10.1016/j.micpro.2025.105202","DOIUrl":"10.1016/j.micpro.2025.105202","url":null,"abstract":"<div><div>Real-time multi-point, full-scene monitoring with low cost, low power consumption, low communication overhead, and front-end deployment is a current research focus in fire detection technology. This paper investigates and implements fire detection technology on the low-computation ZYNQ platform based on deep learning, aiming to provide a cost-effective, highly efficient, and reliable fire detection solution. Firstly, we propose a lightweight network model, YOLO-Fire, which incorporates modifications like replacing standard convolutions with depthwise separable convolutions, adding the ECA attention mechanism, and introducing multi-scale feature fusion to suit the memory and computational limitations of the ZYNQ device. Additionally, we designed a hardware accelerator IP core for the ZYNQ7020 platform using a specific loop tiling strategy, constraint statements, and a dual-dimensional parallel optimization of convolution input and output channels. Combined with fixed-point quantization and resource optimization, this implementation achieves efficient acceleration of convolution, pooling, and upsampling layers. Experimental results show that YOLO-Fire improves accuracy, recall, and F1-score on the BoWFire public flame dataset and a self-constructed flame dataset. Additionally, the average inference time on the ZYNQ platform is approximately 74.43 times faster than on mainstream ARM AI platforms, verifying the effectiveness of the proposed acceleration approach.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105202"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Qubit-size low-power cryogenic CMOS ICs for monolithic quantum processors 用于单片量子处理器的量子比特大小的低功耗低温CMOS集成电路
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 Epub Date: 2025-08-22 DOI: 10.1016/j.micpro.2025.105192
Domenico Zito
This manuscript addresses the severe design challenge for the implementation of microwave and mm-wave control-and-readout ICs enabling the implementation of monolithic Silicon quantum processors (QPs).
For the first time, we describe the circuit design challenge within a unitary frame and provide some general considerations about requirements, technology and performances, as a reference for future developments. In support of the discussion and considerations, we report also some results emerged from our work envisioned and carried out within our research and developments toward monolithic QPs. In particular, we address the key aspects leading to the new design paradigm enabling qubit-size low-power CMOS ICs for qubit control and readout for monolithic QPs and summarize the main characteristics and results exhibited by some representative key building blocks. These circuit solutions open to a new class of low-power mm-wave circuits made of a few MOSFETs, without spiral inductors or other large and lossy distributed passive components, resulting in a characteristic size close to our qubit devices, namely — qubit-size low-power cryogenic ICs, as key enabling solutions for monolithic QPs scalable to a large number of qubits.
本文解决了实现微波和毫米波控制和读出ic的严峻设计挑战,从而实现了单片硅量子处理器(QPs)。我们第一次在一个统一的框架内描述电路设计挑战,并提供一些关于要求,技术和性能的一般考虑,作为未来发展的参考。为了支持讨论和考虑,我们还报告了一些结果,这些结果来自于我们在研究和开发中对单片qp的设想和执行。特别是,我们解决了导致新的设计范式的关键方面,使量子比特大小的低功耗CMOS ic能够用于单片量子比特的量子比特控制和读出,并总结了一些代表性关键构建块所展示的主要特性和结果。这些电路解决方案打开了一类新的低功耗毫米波电路,由几个mosfet组成,没有螺旋电感器或其他大型和有损耗的分布式无源元件,导致特征尺寸接近我们的量子位器件,即-量子位大小的低功耗低温ic,作为可扩展到大量量子位的单片QPs的关键解决方案。
{"title":"Qubit-size low-power cryogenic CMOS ICs for monolithic quantum processors","authors":"Domenico Zito","doi":"10.1016/j.micpro.2025.105192","DOIUrl":"10.1016/j.micpro.2025.105192","url":null,"abstract":"<div><div>This manuscript addresses the severe design challenge for the implementation of microwave and mm-wave control-and-readout ICs enabling the implementation of monolithic Silicon quantum processors (QPs).</div><div>For the first time, we describe the circuit design challenge within a unitary frame and provide some general considerations about requirements, technology and performances, as a reference for future developments. In support of the discussion and considerations, we report also some results emerged from our work envisioned and carried out within our research and developments toward monolithic QPs. In particular, we address the key aspects leading to the new design paradigm enabling qubit-size low-power CMOS ICs for qubit control and readout for monolithic QPs and summarize the main characteristics and results exhibited by some representative key building blocks. These circuit solutions open to a new class of low-power mm-wave circuits made of a few MOSFETs, without spiral inductors or other large and lossy distributed passive components, resulting in a characteristic size close to our qubit devices, namely — qubit-size low-power cryogenic ICs, as key enabling solutions for monolithic QPs scalable to a large number of qubits.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105192"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145096493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards an embedded architecture based back-end processing for AGV SLAM applications 面向AGV SLAM应用的基于后端处理的嵌入式体系结构
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 Epub Date: 2025-10-03 DOI: 10.1016/j.micpro.2025.105206
Mohammed Chghaf, Sergio Rodríguez Flórez, Abdelhafid El Ouardi
Place recognition plays a crucial role in the Simultaneous Localization and Mapping (SLAM) process of self-driving cars. Over time, motion estimation is prone to accumulating errors, leading to drift. The ability to accurately recognize previously visited areas through the place recognition system allows for the correction of these drift errors in real-time. Recognizing places based on the structural aspects of the environment tends to be more resilient against variations in lighting, which can cause incorrect identifications when using feature-based descriptors. Nevertheless, research has predominantly focused on using depth sensors for this purpose. Inspired by a LiDAR-based approach, we introduce an inter-modal geometric descriptor that leverages the structural information obtained through a stereo camera.
Using this descriptor, we can achieve real-time place recognition by focusing on the structural appearance of the scene derived from a 3D vision system. Our experiments on the KITTI dataset and our self-collected dataset show that the proposed approach is comparable to state-of-the-art methods, all while being low-cost. We studied the algorithm’s complexity to propose an optimized parallelization on GPU and FPGA architectures. Performance evaluation on different hardware (Jetson AGX Xavier and Arria 10 SoC) shows that the real-time requirements of an embedded system are met. Compared to a CPU implementation, processing times showed a speed-up between 4x and 10x, depending on the architecture.
位置识别在自动驾驶汽车同步定位与地图绘制过程中起着至关重要的作用。随着时间的推移,运动估计容易累积误差,导致漂移。通过位置识别系统准确识别以前访问过的区域的能力允许实时纠正这些漂移误差。根据环境的结构特征来识别地点往往更能适应光照的变化,而光照的变化在使用基于特征的描述符时可能会导致错误的识别。然而,研究主要集中在使用深度传感器来实现这一目的。受基于激光雷达的方法的启发,我们引入了一种利用通过立体摄像机获得的结构信息的多模态几何描述符。使用该描述符,我们可以通过关注来自3D视觉系统的场景结构外观来实现实时位置识别。我们在KITTI数据集和我们自己收集的数据集上的实验表明,所提出的方法与最先进的方法相当,同时成本低。研究了算法的复杂度,提出了一种基于GPU和FPGA架构的优化并行化算法。在不同硬件(Jetson AGX Xavier和Arria 10 SoC)上进行的性能评估表明,该系统能够满足嵌入式系统的实时性要求。与CPU实现相比,根据体系结构的不同,处理时间加快了4到10倍。
{"title":"Towards an embedded architecture based back-end processing for AGV SLAM applications","authors":"Mohammed Chghaf,&nbsp;Sergio Rodríguez Flórez,&nbsp;Abdelhafid El Ouardi","doi":"10.1016/j.micpro.2025.105206","DOIUrl":"10.1016/j.micpro.2025.105206","url":null,"abstract":"<div><div>Place recognition plays a crucial role in the Simultaneous Localization and Mapping (SLAM) process of self-driving cars. Over time, motion estimation is prone to accumulating errors, leading to drift. The ability to accurately recognize previously visited areas through the place recognition system allows for the correction of these drift errors in real-time. Recognizing places based on the structural aspects of the environment tends to be more resilient against variations in lighting, which can cause incorrect identifications when using feature-based descriptors. Nevertheless, research has predominantly focused on using depth sensors for this purpose. Inspired by a LiDAR-based approach, we introduce an inter-modal geometric descriptor that leverages the structural information obtained through a stereo camera.</div><div>Using this descriptor, we can achieve real-time place recognition by focusing on the structural appearance of the scene derived from a 3D vision system. Our experiments on the KITTI dataset and our self-collected dataset show that the proposed approach is comparable to state-of-the-art methods, all while being low-cost. We studied the algorithm’s complexity to propose an optimized parallelization on GPU and FPGA architectures. Performance evaluation on different hardware (Jetson AGX Xavier and Arria 10 SoC) shows that the real-time requirements of an embedded system are met. Compared to a CPU implementation, processing times showed a speed-up between 4x and 10x, depending on the architecture.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105206"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145267169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polynomial formal verification parameterized by cutwidth properties of a circuit using Boolean satisfiability 用布尔可满足性参数化电路宽度特性的多项式形式验证
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 Epub Date: 2025-09-02 DOI: 10.1016/j.micpro.2025.105199
Luca Müller , Rolf Drechsler
Verification is an essential step in the design process of microprocessors. A complete coverage can only be ensured by formal methods, which tend to have exponential runtimes in the general case. Polynomial Formal Verification addresses this issue, opening a research field focused on providing formal methods which can ensure 100% correctness along with predictable and manageable time and space complexity. In this work, two SAT-based verification approaches in the field of PFV are presented. For both the verification of the cutwidth decomposition on the Circuit-CNF and the verification of the cutwidth decomposition on the Circuit-AIG, it is proven that their time complexity is parameterized by their respective cutwidth. This enables the definition of a class of circuits with constant cutwidth, for which verification can be ensured in linear time. After the theoretical considerations, both approaches are experimentally evaluated on the case study of adder circuits, underlining the established theoretical bounds. Finally, both approaches are compared and their significance in the research filed of PFV are stated.
验证是微处理器设计过程中必不可少的一步。完整的覆盖只能通过形式化方法来保证,而形式化方法在一般情况下往往具有指数级的运行时间。多项式形式验证解决了这个问题,打开了一个研究领域,专注于提供可以确保100%正确性以及可预测和可管理的时间和空间复杂性的形式化方法。在这项工作中,提出了两种基于sat的PFV领域验证方法。对Circuit-CNF上的宽度分解的验证和Circuit-AIG上的宽度分解的验证,证明了它们的时间复杂度是由各自的宽度参数化的。这样就可以定义一类具有恒定切割宽度的电路,并确保在线性时间内对其进行验证。在理论考虑之后,两种方法都在加法器电路的案例研究中进行了实验评估,强调了已建立的理论界限。最后,对两种方法进行了比较,并指出了它们在PFV研究领域的意义。
{"title":"Polynomial formal verification parameterized by cutwidth properties of a circuit using Boolean satisfiability","authors":"Luca Müller ,&nbsp;Rolf Drechsler","doi":"10.1016/j.micpro.2025.105199","DOIUrl":"10.1016/j.micpro.2025.105199","url":null,"abstract":"<div><div>Verification is an essential step in the design process of microprocessors. A complete coverage can only be ensured by formal methods, which tend to have exponential runtimes in the general case. Polynomial Formal Verification addresses this issue, opening a research field focused on providing formal methods which can ensure 100% correctness along with predictable and manageable time and space complexity. In this work, two SAT-based verification approaches in the field of PFV are presented. For both the verification of the cutwidth decomposition on the Circuit-CNF and the verification of the cutwidth decomposition on the Circuit-AIG, it is proven that their time complexity is parameterized by their respective cutwidth. This enables the definition of a class of circuits with constant cutwidth, for which verification can be ensured in linear time. After the theoretical considerations, both approaches are experimentally evaluated on the case study of adder circuits, underlining the established theoretical bounds. Finally, both approaches are compared and their significance in the research filed of PFV are stated.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105199"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145010698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Time-predictable warp scheduling in a GPU GPU中可预测时间的翘曲调度
IF 2.6 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-01 Epub Date: 2025-09-25 DOI: 10.1016/j.micpro.2025.105203
Noïc Crouzet, Thomas Carle, Christine Rochange
This paper presents architectural design solutions aimed at improving the timing predictability of GPU pipelines, with a particular focus on the behavior of hardware schedulers in the fetch and issue stages. We argue that without coordination between these schedulers at each cycle, the timing behavior of the GPU is unpredictable. We show how coordination can be enforced and prove that our solution achieves a predictable behavior. We have implemented it in a modified version of the open-source Vortex GPU, synthesized for an AMD Xilinx FPGA. We evaluate the overhead of the approach both in terms of FPGA resources and execution time.
本文提出了旨在提高GPU管道的时间可预测性的架构设计解决方案,特别关注硬件调度器在获取和发布阶段的行为。我们认为,如果在每个周期中这些调度器之间没有协调,GPU的定时行为是不可预测的。我们将展示如何执行协调,并证明我们的解决方案实现了可预测的行为。我们已经在开源Vortex GPU的修改版本中实现了它,该GPU是为AMD Xilinx FPGA合成的。我们从FPGA资源和执行时间两方面评估了该方法的开销。
{"title":"Time-predictable warp scheduling in a GPU","authors":"Noïc Crouzet,&nbsp;Thomas Carle,&nbsp;Christine Rochange","doi":"10.1016/j.micpro.2025.105203","DOIUrl":"10.1016/j.micpro.2025.105203","url":null,"abstract":"<div><div>This paper presents architectural design solutions aimed at improving the timing predictability of GPU pipelines, with a particular focus on the behavior of hardware schedulers in the fetch and issue stages. We argue that without coordination between these schedulers at each cycle, the timing behavior of the GPU is unpredictable. We show how coordination can be enforced and prove that our solution achieves a predictable behavior. We have implemented it in a modified version of the open-source Vortex GPU, synthesized for an AMD Xilinx FPGA. We evaluate the overhead of the approach both in terms of FPGA resources and execution time.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"118 ","pages":"Article 105203"},"PeriodicalIF":2.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145220601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Microprocessors and Microsystems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1