首页 > 最新文献

Microprocessors and Microsystems最新文献

英文 中文
Algorithms for scheduling CNNs on multicore MCUs at the neuron and layer levels 多核 MCU 神经元和层级 CNN 调度算法
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-01 DOI: 10.1016/j.micpro.2024.105107
Convolutional neural networks (CNNs) are progressively deployed on embedded systems, which is challenging because their computational and energy requirements need to be satisfied by devices with limited resources and power supplies. For instance, they can be implemented in the Internet of Things or edge computing, i.e., in applications using low-power and low-performance microcontroller units (MCUs). Monocore MCUs are not tailored to respond to the computational and energy requirements of CNNs due to their limited resources, but a multicore MCU can overcome these limitations. This paper presents an empirical study analysing three algorithms for scheduling CNNs on embedded systems at two different levels (neuron and layer levels) and evaluates their performance in terms of makespan and energy consumption using six neural networks, both in general and in the case of CubeSats. The results show that the SNN algorithm outperforms the other two algorithms (STD and STS) and that scheduling at the layer level significantly reduces the energy consumption. Therefore, embedded systems based on multicore MCUs are suitable for executing CNNs, and they can be used, for example, on board small satellites called CubeSats.
卷积神经网络(CNN)正逐步部署到嵌入式系统中,这具有挑战性,因为其计算和能源需求需要由资源和电源有限的设备来满足。例如,它们可以在物联网或边缘计算中实施,即在使用低功耗和低性能微控制器单元(MCU)的应用中实施。由于资源有限,单核 MCU 无法满足 CNN 的计算和能源需求,但多核 MCU 可以克服这些限制。本文介绍了一项实证研究,分析了在嵌入式系统上对两个不同级别(神经元和层级)的 CNN 进行调度的三种算法,并使用六个神经网络评估了它们在一般情况下和立方体卫星情况下的正常运行时间和能耗方面的性能。结果表明,SNN 算法优于其他两种算法(STD 和 STS),层级调度可显著降低能耗。因此,基于多核微控制器的嵌入式系统适用于执行 CNN,例如可用于被称为 CubeSats 的小型卫星。
{"title":"Algorithms for scheduling CNNs on multicore MCUs at the neuron and layer levels","authors":"","doi":"10.1016/j.micpro.2024.105107","DOIUrl":"10.1016/j.micpro.2024.105107","url":null,"abstract":"<div><div>Convolutional neural networks (CNNs) are progressively deployed on embedded systems, which is challenging because their computational and energy requirements need to be satisfied by devices with limited resources and power supplies. For instance, they can be implemented in the Internet of Things or edge computing, i.e., in applications using low-power and low-performance microcontroller units (MCUs). Monocore MCUs are not tailored to respond to the computational and energy requirements of CNNs due to their limited resources, but a multicore MCU can overcome these limitations. This paper presents an empirical study analysing three algorithms for scheduling CNNs on embedded systems at two different levels (neuron and layer levels) and evaluates their performance in terms of makespan and energy consumption using six neural networks, both in general and in the case of CubeSats. The results show that the <span>SNN</span> algorithm outperforms the other two algorithms (<span>STD</span> and <span>STS</span>) and that scheduling at the layer level significantly reduces the energy consumption. Therefore, embedded systems based on multicore MCUs are suitable for executing CNNs, and they can be used, for example, on board small satellites called CubeSats.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-cost constant time signed digit selection for most significant bit first multiplication 低成本恒定时间有符号数位选择,用于最显著位首数乘法
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-11-01 DOI: 10.1016/j.micpro.2024.105118
Serial binary multiplication is frequently used in many digital applications. In particular, left-to-right (aka online) manipulation of operands promotes the real-time generation of product digits for immediate utilization in subsequent online computations (e.g., successive layers of a neural network). In the left-to-right arithmetic operations, where a residual is maintained for digit selection, utilization of a redundant number system for the representation of outputs is mandatory, while the input operands and the residual may be redundant or non-redundant. However, when the input data paths are narrow (e.g., eight bits as in BFloat16), conventional non-redundant representations of inputs and residual provide some advantages. For example, the immediate and costless sign detection of the residual that is necessary for the next digit selection; a property not shared by redundant numbers. Nevertheless, digit selection, as practiced in the previous realizations, with both redundant and non-redundant inputs and/or residual, is slow and rather complex. Therefore, in this paper, we offer an imprecise, but faster digit selection scheme, with the required correction in the next cycle. Analytical evaluations and synthesis of the proposed circuits on FPGA platform, shows 30 % speedup and less cost with respect to both cases with redundant and non-redundant inputs and residual.
串行二进制乘法经常用于许多数字应用中。特别是,操作数的从左到右(又称在线)运算可促进实时生成乘积数字,以便在随后的在线计算(如神经网络的连续层)中立即使用。在从左到右的算术运算中,需要保留一个残差用于数字选择,因此必须使用冗余数字系统来表示输出,而输入操作数和残差可以是冗余或非冗余的。然而,当输入数据路径较窄时(如 BFloat16 中的 8 位),传统的非冗余输入和残差表示法具有一些优势。例如,下一位数选择所需的残差可立即、无代价地进行符号检测;这是冗余数字所不具备的特性。尽管如此,在以往的实现过程中,利用冗余和非冗余输入和/或残差进行数字选择的速度很慢,而且相当复杂。因此,在本文中,我们提供了一种不精确但更快的数字选择方案,并在下一个周期进行所需的校正。在 FPGA 平台上对所提电路进行的分析评估和综合显示,与冗余和非冗余输入及残差两种情况相比,速度提高了 30%,成本降低了。
{"title":"Low-cost constant time signed digit selection for most significant bit first multiplication","authors":"","doi":"10.1016/j.micpro.2024.105118","DOIUrl":"10.1016/j.micpro.2024.105118","url":null,"abstract":"<div><div>Serial binary multiplication is frequently used in many digital applications. In particular, left-to-right (aka online) manipulation of operands promotes the real-time generation of product digits for immediate utilization in subsequent online computations (e.g., successive layers of a neural network). In the left-to-right arithmetic operations, where a residual is maintained for digit selection, utilization of a redundant number system for the representation of outputs is mandatory, while the input operands and the residual may be redundant or non-redundant. However, when the input data paths are narrow (e.g., eight bits as in BFloat16), conventional non-redundant representations of inputs and residual provide some advantages. For example, the immediate and costless sign detection of the residual that is necessary for the next digit selection; a property not shared by redundant numbers. Nevertheless, digit selection, as practiced in the previous realizations, with both redundant and non-redundant inputs and/or residual, is slow and rather complex. Therefore, in this paper, we offer an imprecise, but faster digit selection scheme, with the required correction in the next cycle. Analytical evaluations and synthesis of the proposed circuits on FPGA platform, shows 30 % speedup and less cost with respect to both cases with redundant and non-redundant inputs and residual.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retraction notice to “A Hybrid Semantic Similarity Measurement for Geospatial Entities” [Microprocessors and Microsystems 80 (2021) 103526] 地理空间实体的混合语义相似性测量 "的撤稿通知 [Microprocessors and Microsystems 80 (2021) 103526]
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-20 DOI: 10.1016/j.micpro.2024.105117
{"title":"Retraction notice to “A Hybrid Semantic Similarity Measurement for Geospatial Entities” [Microprocessors and Microsystems 80 (2021) 103526]","authors":"","doi":"10.1016/j.micpro.2024.105117","DOIUrl":"10.1016/j.micpro.2024.105117","url":null,"abstract":"","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142531709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SIMIL: SIMple Issue Logic for GPUs SIMIL:用于 GPU 的简单问题逻辑
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-09 DOI: 10.1016/j.micpro.2024.105105
GPU architectures have become popular for executing general-purpose programs. In particular, they are some of the most efficient architectures for machine learning applications which are among the most trendy and demanding applications nowadays.
This paper presents SIMIL (SIMple Issue Logic for GPUs), an architectural modification to the issue stage that replaces scoreboards with a Dependence Matrix to track dependencies among instructions and avoid data hazards. We show that a Dependence Matrix is more effective in the presence of repetitive use of source operands, which is common in many applications. Besides, a Dependence Matrix with minor extensions can also support a simplistic out-of-order issue. Evaluations on an NVIDIA Tesla V100-like GPU show that SIMIL provides a speed-up of up to 2.39 in some machine learning programs and 1.31 on average for various benchmarks, while it reduces energy consumption by 12.81%, with only 1.5% area overhead. We also show that SIMIL outperforms a recently proposed approach for out-of-order issue that uses register renaming.
GPU 架构已成为执行通用程序的流行架构。本文介绍了 SIMIL(SIMple Issue Logic for GPUs,GPU 的简单问题逻辑),它是对问题阶段的架构修改,用依赖矩阵取代记分板,以跟踪指令之间的依赖关系并避免数据危险。我们的研究表明,在重复使用源操作数的情况下,依赖矩阵更为有效,而这在许多应用中都很常见。此外,稍加扩展的依赖性矩阵还能支持简单的失序问题。在英伟达™(NVIDIA®)Tesla V100 类 GPU 上进行的评估表明,SIMIL 在某些机器学习程序中的速度提升了 2.39 倍,在各种基准测试中的平均速度提升了 1.31 倍,能耗降低了 12.81%,而面积开销仅为 1.5%。我们还表明,SIMIL 的性能优于最近提出的一种使用寄存器重命名来处理失序问题的方法。
{"title":"SIMIL: SIMple Issue Logic for GPUs","authors":"","doi":"10.1016/j.micpro.2024.105105","DOIUrl":"10.1016/j.micpro.2024.105105","url":null,"abstract":"<div><div>GPU architectures have become popular for executing general-purpose programs. In particular, they are some of the most efficient architectures for machine learning applications which are among the most trendy and demanding applications nowadays.</div><div>This paper presents SIMIL (SIMple Issue Logic for GPUs), an architectural modification to the issue stage that replaces scoreboards with a Dependence Matrix to track dependencies among instructions and avoid data hazards. We show that a Dependence Matrix is more effective in the presence of repetitive use of source operands, which is common in many applications. Besides, a Dependence Matrix with minor extensions can also support a simplistic out-of-order issue. Evaluations on an NVIDIA Tesla V100-like GPU show that SIMIL provides a speed-up of up to 2.39 in some machine learning programs and 1.31 on average for various benchmarks, while it reduces energy consumption by 12.81%, with only 1.5% area overhead. We also show that SIMIL outperforms a recently proposed approach for out-of-order issue that uses register renaming.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142531711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A hardware architecture for single and multiple ellipse detection using genetic algorithms and high-level synthesis tools 利用遗传算法和高级合成工具实现单椭圆和多椭圆检测的硬件架构
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-09 DOI: 10.1016/j.micpro.2024.105106
Ellipse detection techniques are often developed and validated in software environments, neglecting the critical consideration of computational efficiency and resource constraints prevalent in embedded systems. Furthermore, programmable logic devices, notably Field Programmable Gate Arrays (FPGAs), have emerged as indispensable assets for enhancing performance and expediting various processing applications. In the realm of computational efficiency, hardware implementations have the flexibility to tailor the required arithmetic for various applications using fixed-point representation. This approach enables faster computations while upholding adequate accuracy, resulting in reduced resource and energy consumption compared to software applications that rely on higher clock speeds, which often lead to increased resource and energy consumption. Additionally, hardware solutions provide portability and are suitable for resource-constrained and battery-powered applications. This study introduces a novel hardware architecture in the form of an intellectual property core that harnesses the capabilities of a genetic algorithm to detect single and multi ellipses in digital images. In general, genetic algorithms have been demonstrated to be an alternative that shows better results than those based on traditional methods such as the Hough Transform and Random Sample Consensus, particularly in terms of accuracy, flexibility, and robustness. Our genetic algorithm randomly takes five edge points as parameters from the image tested, creating an individual treated as a potential candidate ellipse. The fitness evaluation function determines whether the candidate ellipse truly exists in the image space. The core is designed using Vitis High-Level Synthesis (HLS), a powerful tool that converts C or C++functions into Register-Transfer Level (RTL) code, including VHDL and Verilog. The implementation and testing of the ellipse detection system were carried out on the PYNQ-Z1, a cost-effective development board housing the Xilinx Zynq-7000 System-on-Chip (SoC). PYNQ, an open-source framework, seamlessly integrates programmable logic with a dual-core ARM Cortex-A9 processor, offering the flexibility of Python programming for the onboard SoC processor. The experimental results, based on synthetic and real images, some of them with the presence of noise processed by the developed ellipse detection system, highlight the intellectual property core’s exceptional suitability for resource-constrained embedded systems. Notably, it achieves remarkable performance and accuracy rates, consistently exceeding 99% in most cases. This research aims to contribute to the advancement of hardware-accelerated ellipse detection, catering to the demanding requirements of real-time applications while minimizing resource consumption.
椭圆检测技术通常是在软件环境中开发和验证的,忽略了对嵌入式系统中普遍存在的计算效率和资源限制的重要考虑。此外,可编程逻辑器件,特别是现场可编程门阵列(FPGA),已成为提高性能和加速各种处理应用不可或缺的资产。在计算效率方面,硬件实现可以灵活地使用定点表示法为各种应用定制所需的算术。与依赖较高时钟速度的软件应用程序相比,这种方法能在保持足够精度的同时加快计算速度,从而减少资源和能源消耗,而软件应用程序往往会导致资源和能源消耗增加。此外,硬件解决方案还具有可移植性,适用于资源受限和电池供电的应用。本研究以知识产权核心的形式介绍了一种新颖的硬件架构,该架构利用遗传算法的能力来检测数字图像中的单椭圆和多椭圆。一般来说,遗传算法已被证明是一种替代方法,其结果优于基于传统方法(如 Hough 变换和随机样本共识)的算法,特别是在准确性、灵活性和鲁棒性方面。我们的遗传算法从被测图像中随机抽取五个边缘点作为参数,创建一个被视为潜在候选椭圆的个体。适配性评估功能可确定候选椭圆是否真正存在于图像空间中。内核使用 Vitis 高级合成(HLS)设计,这是一种功能强大的工具,可将 C 或 C++ 函数转换为寄存器传输层(RTL)代码,包括 VHDL 和 Verilog。椭圆检测系统的实施和测试是在PYNQ-Z1上进行的,PYNQ-Z1是一个内置Xilinx Zynq-7000系统级芯片(SoC)的高性价比开发板。PYNQ是一个开源框架,将可编程逻辑与双核ARM Cortex-A9处理器无缝集成,为板载SoC处理器提供了Python编程的灵活性。实验结果基于合成图像和真实图像,其中一些图像经开发的椭圆检测系统处理后存在噪声。值得注意的是,它实现了卓越的性能和准确率,在大多数情况下始终超过 99%。这项研究旨在推动硬件加速椭圆检测技术的发展,满足实时应用的苛刻要求,同时最大限度地减少资源消耗。
{"title":"A hardware architecture for single and multiple ellipse detection using genetic algorithms and high-level synthesis tools","authors":"","doi":"10.1016/j.micpro.2024.105106","DOIUrl":"10.1016/j.micpro.2024.105106","url":null,"abstract":"<div><div>Ellipse detection techniques are often developed and validated in software environments, neglecting the critical consideration of computational efficiency and resource constraints prevalent in embedded systems. Furthermore, programmable logic devices, notably Field Programmable Gate Arrays (FPGAs), have emerged as indispensable assets for enhancing performance and expediting various processing applications. In the realm of computational efficiency, hardware implementations have the flexibility to tailor the required arithmetic for various applications using fixed-point representation. This approach enables faster computations while upholding adequate accuracy, resulting in reduced resource and energy consumption compared to software applications that rely on higher clock speeds, which often lead to increased resource and energy consumption. Additionally, hardware solutions provide portability and are suitable for resource-constrained and battery-powered applications. This study introduces a novel hardware architecture in the form of an intellectual property core that harnesses the capabilities of a genetic algorithm to detect single and multi ellipses in digital images. In general, genetic algorithms have been demonstrated to be an alternative that shows better results than those based on traditional methods such as the Hough Transform and Random Sample Consensus, particularly in terms of accuracy, flexibility, and robustness. Our genetic algorithm randomly takes five edge points as parameters from the image tested, creating an individual treated as a potential candidate ellipse. The fitness evaluation function determines whether the candidate ellipse truly exists in the image space. The core is designed using Vitis High-Level Synthesis (HLS), a powerful tool that converts C or C++functions into Register-Transfer Level (RTL) code, including VHDL and Verilog. The implementation and testing of the ellipse detection system were carried out on the PYNQ-Z1, a cost-effective development board housing the Xilinx Zynq-7000 System-on-Chip (SoC). PYNQ, an open-source framework, seamlessly integrates programmable logic with a dual-core ARM Cortex-A9 processor, offering the flexibility of Python programming for the onboard SoC processor. The experimental results, based on synthetic and real images, some of them with the presence of noise processed by the developed ellipse detection system, highlight the intellectual property core’s exceptional suitability for resource-constrained embedded systems. Notably, it achieves remarkable performance and accuracy rates, consistently exceeding 99% in most cases. This research aims to contribute to the advancement of hardware-accelerated ellipse detection, catering to the demanding requirements of real-time applications while minimizing resource consumption.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142432862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tuning high-level synthesis SpMV kernels in Alveo FPGAs 在 Alveo FPGA 中调整高级合成 SpMV 内核
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-10-01 DOI: 10.1016/j.micpro.2024.105104
Sparse Matrix-Vector Multiplication (SpMV) is an essential operation in scientific and engineering fields, with applications in areas like finite element analysis, image processing, and machine learning. To address the need for faster and more energy-efficient computing, this paper investigates the acceleration of SpMV through Field-Programmable Gate Arrays (FPGAs), leveraging High-Level Synthesis (HLS) for design simplicity. Our study focuses on the AMD-Xilinx Alveo U280 FPGA, assessing the performance of the SpMV kernel from Vitis Libraries, which is the state of the art on SpMV acceleration on FPGAs. We explore kernel modifications, transition to single precision, and varying partition sizes, demonstrating the impact of these changes on execution time. Furthermore, we investigate matrix preprocessing techniques, including Reverse Cuthill-McKee (RCM) reordering and a hybrid sparse storage format, to enhance efficiency. Our findings reveal that the performance of FPGA-accelerated SpMV is influenced by matrix characteristics, by smaller partition sizes, and by specific preprocessing techniques delivering notable performance improvements. By selecting the best results from these experiments, we achieved execution time enhancements of up to 3.2×. This study advances the understanding of FPGA-accelerated SpMV, providing insights into key factors that impact performance and potential avenues for further improvement.
稀疏矩阵-矢量乘法(SpMV)是科学和工程领域的一项基本运算,在有限元分析、图像处理和机器学习等领域都有应用。为了满足对更快、更节能计算的需求,本文研究了如何通过现场可编程门阵列(FPGA)加速 SpMV,并利用高级合成(HLS)简化设计。我们的研究以 AMD-Xilinx Alveo U280 FPGA 为重点,评估了 Vitis Libraries 的 SpMV 内核的性能,该内核是在 FPGA 上加速 SpMV 的最新技术。我们探索了内核修改、向单精度的过渡以及不同的分区大小,展示了这些变化对执行时间的影响。此外,我们还研究了矩阵预处理技术,包括反向 Cuthill-McKee (RCM) 重新排序和混合稀疏存储格式,以提高效率。我们的研究结果表明,FPGA 加速 SpMV 的性能受矩阵特性、较小的分区大小以及可显著提高性能的特定预处理技术的影响。通过从这些实验中选择最佳结果,我们实现了高达 3.2 倍的执行时间提升。这项研究加深了人们对 FPGA 加速 SpMV 的理解,使人们深入了解了影响性能的关键因素和进一步改进的潜在途径。
{"title":"Tuning high-level synthesis SpMV kernels in Alveo FPGAs","authors":"","doi":"10.1016/j.micpro.2024.105104","DOIUrl":"10.1016/j.micpro.2024.105104","url":null,"abstract":"<div><div>Sparse Matrix-Vector Multiplication (SpMV) is an essential operation in scientific and engineering fields, with applications in areas like finite element analysis, image processing, and machine learning. To address the need for faster and more energy-efficient computing, this paper investigates the acceleration of SpMV through Field-Programmable Gate Arrays (FPGAs), leveraging High-Level Synthesis (HLS) for design simplicity. Our study focuses on the AMD-Xilinx Alveo U280 FPGA, assessing the performance of the SpMV kernel from Vitis Libraries, which is the state of the art on SpMV acceleration on FPGAs. We explore kernel modifications, transition to single precision, and varying partition sizes, demonstrating the impact of these changes on execution time. Furthermore, we investigate matrix preprocessing techniques, including Reverse Cuthill-McKee (RCM) reordering and a hybrid sparse storage format, to enhance efficiency. Our findings reveal that the performance of FPGA-accelerated SpMV is influenced by matrix characteristics, by smaller partition sizes, and by specific preprocessing techniques delivering notable performance improvements. By selecting the best results from these experiments, we achieved execution time enhancements of up to 3.2<span><math><mo>×</mo></math></span>. This study advances the understanding of FPGA-accelerated SpMV, providing insights into key factors that impact performance and potential avenues for further improvement.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SLOPE: Safety LOg PEripherals implementation and software drivers for a safe RISC-V microcontroller unit SLOPE:用于安全 RISC-V 微控制器单元的安全 LOg PEripherals 实现和软件驱动程序
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-19 DOI: 10.1016/j.micpro.2024.105103

The focus of this manuscript is related to the main safety issues regarding a mixed criticality system running multiple concurrent tasks. Our concerns are related to the guarantee of Freedom of Interference between concurrent partitions, and to the respect of the Worst Case Execution Time for tasks. Moreover, we are interested in the evaluation of resources budgeting and the study of system behavior in case of occurring random hardware failures. In this paper we present a set of Safety LOg PEripherals (SLOPE): Performance Monitoring Unit (PMU), Execution Tracing Unit (ETU), Error Management Unit (EMU), Time Management Unit (TMU) and Data Log Unit (DLU); then, an implementation of SLOPE on a single core RISC-V architecture is proposed. Such peripherals are able to collect software and hardware information about execution, and eventually trigger recovery actions to mitigate a possible dangerous misbehavior. We show results of the hardware implementation and software testing of the units with a dedicated software library. For the PMU we standardized the software layer according to embedded Performance Application Programming Interface (ePAPI), and compared its functionality with a bare-metal use of the library. To test the ETU we compared the hardware simulation results with software ones, to understand if overflow may occur in internal hardware buffers during tracing. In conclusion, designed devices introduce new instruments for system investigation for RISC-V technologies and can generate an execution profile for safety related tasks.

本手稿的重点是运行多个并发任务的混合临界系统的主要安全问题。我们关注的是并发分区之间的自由干扰保证,以及任务的最坏执行时间。此外,我们还对资源预算评估和发生随机硬件故障时的系统行为研究感兴趣。在本文中,我们提出了一套安全 LOg PEripherals (SLOPE):性能监控单元(PMU)、执行跟踪单元(ETU)、错误管理单元(EMU)、时间管理单元(TMU)和数据日志单元(DLU)。这些外设能够收集有关执行的软件和硬件信息,并最终触发恢复行动,以减轻可能出现的危险不当行为。我们展示了使用专用软件库对这些单元进行硬件实施和软件测试的结果。对于 PMU,我们根据嵌入式性能应用编程接口(ePAPI)对软件层进行了标准化,并将其功能与裸机使用的库进行了比较。为了测试 ETU,我们将硬件模拟结果与软件结果进行了比较,以了解在跟踪过程中内部硬件缓冲区是否会发生溢出。总之,设计的设备为 RISC-V 技术的系统研究引入了新的工具,并能为安全相关任务生成执行配置文件。
{"title":"SLOPE: Safety LOg PEripherals implementation and software drivers for a safe RISC-V microcontroller unit","authors":"","doi":"10.1016/j.micpro.2024.105103","DOIUrl":"10.1016/j.micpro.2024.105103","url":null,"abstract":"<div><p>The focus of this manuscript is related to the main safety issues regarding a mixed criticality system running multiple concurrent tasks. Our concerns are related to the guarantee of Freedom of Interference between concurrent partitions, and to the respect of the Worst Case Execution Time for tasks. Moreover, we are interested in the evaluation of resources budgeting and the study of system behavior in case of occurring random hardware failures. In this paper we present a set of Safety LOg PEripherals (SLOPE): Performance Monitoring Unit (PMU), Execution Tracing Unit (ETU), Error Management Unit (EMU), Time Management Unit (TMU) and Data Log Unit (DLU); then, an implementation of SLOPE on a single core RISC-V architecture is proposed. Such peripherals are able to collect software and hardware information about execution, and eventually trigger recovery actions to mitigate a possible dangerous misbehavior. We show results of the hardware implementation and software testing of the units with a dedicated software library. For the PMU we standardized the software layer according to embedded Performance Application Programming Interface (ePAPI), and compared its functionality with a bare-metal use of the library. To test the ETU we compared the hardware simulation results with software ones, to understand if overflow may occur in internal hardware buffers during tracing. In conclusion, designed devices introduce new instruments for system investigation for RISC-V technologies and can generate an execution profile for safety related tasks.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142274383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RED-SEA Project: Towards a new-generation European interconnect RED-SEA 项目:建立新一代欧洲互连网
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-16 DOI: 10.1016/j.micpro.2024.105102
RED-SEA is a H2020 EuroHPC project, whose main objective is to prepare a new-generation European Interconnect, capable of powering the EU Exascale systems to come, through an economically viable and technologically efficient interconnect, leveraging European interconnect technology (BXI) associated with standard and mature technology (Ethernet), previous EU-funded initiatives, as well as open standards and compatible APIs.
To achieve this objective, the RED-SEA project is being carried out around four key pillars: (i) network architecture and workload requirements-interconnects co-design – aiming at optimizing the fit with the other EuroHPC projects and with the EPI processors; (ii) development of a high-performance, low-latency, seamless bridge with Ethernet; (iii) efficient network resource management, including congestion and Quality-of-Service; and (iv) end-to-end functions implemented at the network edges.
This paper presents key achievements and results at the midterm of the project for each key pillar in the way to reach the final project objective. In this regard we can highlight: (i) The definition of the network requirements and architecture as well as a list of benchmarks and applications; (ii) In addition to initially planned IPs progress, BXI3 architecture has evolved to support natively Ethernet at low level, resulting in reduced complexity, with advantages in terms of cost optimization, and power consumption; (iii) The congestion characterization of target applications and proposals to reduce this congestion by the optimization of collective communication primitives, injection throttling and adaptive routing; and (iv) the low-latency high-message rate endpoint functions and their connection with new open technologies.
RED-SEA 是一个 H2020 EuroHPC 项目,其主要目标是利用与标准成熟技术(以太网)相关的欧洲互联技术(BXI)、以前的欧盟资助计划以及开放标准和兼容 API,通过经济上可行、技术上高效的互联技术,为新一代欧洲互联技术做好准备,使其能够为未来的欧盟超大规模系统提供动力。为实现这一目标,RED-SEA 项目围绕四个关键支柱展开:(i) 网络架构和工作负载要求--互连协同设计--旨在优化与其他 EuroHPC 项目和 EPI 处理器的匹配;(ii) 开发高性能、低延迟、与以太网无缝连接的桥接器;(iii) 高效网络资源管理,包括拥塞和服务质量;(iv) 在网络边缘实现端到端功能。本文介绍了在实现项目最终目标的过程中,每个关键支柱在项目中期取得的主要成就和成果。在这方面,我们可以强调(i) 网络要求和架构的定义,以及基准和应用清单;(ii) 除了最初计划的 IP 进展外,BXI3 架构已发展到在低层次上支持本地以太网,从而降低了复杂性,在成本优化和功耗方面具有优势;(iii) 目标应用的拥塞特征,以及通过优化集体通信基元、注入节流和自适应路由来减少拥塞的建议;以及 (iv) 低延迟高信息速率端点功能及其与新开放技术的连接。
{"title":"RED-SEA Project: Towards a new-generation European interconnect","authors":"","doi":"10.1016/j.micpro.2024.105102","DOIUrl":"10.1016/j.micpro.2024.105102","url":null,"abstract":"<div><div>RED-SEA is a H2020 EuroHPC project, whose main objective is to prepare a new-generation European Interconnect, capable of powering the EU Exascale systems to come, through an economically viable and technologically efficient interconnect, leveraging European interconnect technology (BXI) associated with standard and mature technology (Ethernet), previous EU-funded initiatives, as well as open standards and compatible APIs.</div><div>To achieve this objective, the RED-SEA project is being carried out around four key pillars: (i) network architecture and workload requirements-interconnects co-design – aiming at optimizing the fit with the other EuroHPC projects and with the EPI processors; (ii) development of a high-performance, low-latency, seamless bridge with Ethernet; (iii) efficient network resource management, including congestion and Quality-of-Service; and (iv) end-to-end functions implemented at the network edges.</div><div>This paper presents key achievements and results at the midterm of the project for each key pillar in the way to reach the final project objective. In this regard we can highlight: (i) The definition of the network requirements and architecture as well as a list of benchmarks and applications; (ii) In addition to initially planned IPs progress, BXI3 architecture has evolved to support natively Ethernet at low level, resulting in reduced complexity, with advantages in terms of cost optimization, and power consumption; (iii) The congestion characterization of target applications and proposals to reduce this congestion by the optimization of collective communication primitives, injection throttling and adaptive routing; and (iv) the low-latency high-message rate endpoint functions and their connection with new open technologies.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0141933124000978/pdfft?md5=078031f75a9ce320a049b03c1e432247&pid=1-s2.0-S0141933124000978-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142314850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent advances in Machine Learning based Advanced Driver Assistance System applications 基于机器学习的高级驾驶辅助系统应用的最新进展
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-09-12 DOI: 10.1016/j.micpro.2024.105101

In recent years, the rise of traffic in modern cities has demanded novel technology to support the drivers and protect the passengers and other third parties involved in transportation. Thanks to rapid technological progress and innovations, many Advanced Driver Assistance Systems (A/DAS) based on Machine Learning (ML) algorithms have emerged to address the increasing demand for practical A/DAS applications. Fast and accurate execution of A/DAS algorithms is essential for preventing loss of life and property. High-speed hardware accelerators are vital for processing the high volume of data captured by increasingly sophisticated sensors and complex mathematical models’ execution of modern deep learning (DL) algorithms. One of the fundamental challenges in this new era is to design energy-efficient and portable ML-enabled platforms for vehicles to provide driver assistance and safety. This article presents recent progress in ML-driven A/DAS technology to offer new insights for researchers. We covered standard ML models and optimization approaches based on widely accepted open-source frameworks extensively used in A/DAS applications. We have also highlighted related articles on ML and its sub-branches, neural networks (NNs), and DL. We have also reported the implementation issues, bench-marking problems, and potential challenges for future research. Popular embedded hardware platforms such as Field Programmable Gate Arrays (FPGAs), central processing units (CPUs), Graphical Processing Units (GPUs), and Application Specific Integrated Circuits (ASICs) used to implement A/DAS applications are also compared concerning their performance and resource utilization. We have examined the hardware and software development environments used in implementing A/DAS applications and reported their advantages and disadvantages. We provided performance comparisons of usual A/DAS tasks such as traffic sign recognition, road and lane detection, vehicle and pedestrian detection, driver behavior, and multiple tasking. Considering the current research dynamics, A/DAS will remain one of the most popular application fields for vehicular transportation shortly.

近年来,现代城市的交通流量不断增加,这就需要新技术来支持驾驶员,保护乘客和其他参与交通的第三方。得益于快速的技术进步和创新,许多基于机器学习(ML)算法的高级驾驶辅助系统(A/DAS)应运而生,以满足对 A/DAS 实际应用日益增长的需求。快速准确地执行 A/DAS 算法对于防止生命和财产损失至关重要。高速硬件加速器对于处理日益精密的传感器捕获的大量数据和执行现代深度学习(DL)算法的复杂数学模型至关重要。新时代的基本挑战之一是为车辆设计高能效、便携式的人工智能平台,以提供驾驶辅助和安全。本文介绍了 ML 驱动的 A/DAS 技术的最新进展,为研究人员提供了新的见解。我们介绍了标准 ML 模型和优化方法,它们基于广泛应用于 A/DAS 应用的开源框架。我们还重点介绍了有关 ML 及其分支、神经网络 (NN) 和 DL 的相关文章。我们还报告了实施问题、基准问题和未来研究的潜在挑战。我们还比较了用于实现 A/DAS 应用程序的常用嵌入式硬件平台,如现场可编程门阵列 (FPGA)、中央处理器 (CPU)、图形处理器 (GPU) 和专用集成电路 (ASIC),了解它们的性能和资源利用情况。我们研究了用于实施 A/DAS 应用程序的硬件和软件开发环境,并报告了它们的优缺点。我们提供了常见 A/DAS 任务的性能比较,如交通标志识别、道路和车道检测、车辆和行人检测、驾驶员行为和多重任务。考虑到当前的研究动态,A/DAS 在短期内仍将是车辆交通领域最热门的应用领域之一。
{"title":"Recent advances in Machine Learning based Advanced Driver Assistance System applications","authors":"","doi":"10.1016/j.micpro.2024.105101","DOIUrl":"10.1016/j.micpro.2024.105101","url":null,"abstract":"<div><p>In recent years, the rise of traffic in modern cities has demanded novel technology to support the drivers and protect the passengers and other third parties involved in transportation. Thanks to rapid technological progress and innovations, many Advanced Driver Assistance Systems (A/DAS) based on Machine Learning (ML) algorithms have emerged to address the increasing demand for practical A/DAS applications. Fast and accurate execution of A/DAS algorithms is essential for preventing loss of life and property. High-speed hardware accelerators are vital for processing the high volume of data captured by increasingly sophisticated sensors and complex mathematical models’ execution of modern deep learning (DL) algorithms. One of the fundamental challenges in this new era is to design energy-efficient and portable ML-enabled platforms for vehicles to provide driver assistance and safety. This article presents recent progress in ML-driven A/DAS technology to offer new insights for researchers. We covered standard ML models and optimization approaches based on widely accepted open-source frameworks extensively used in A/DAS applications. We have also highlighted related articles on ML and its sub-branches, neural networks (NNs), and DL. We have also reported the implementation issues, bench-marking problems, and potential challenges for future research. Popular embedded hardware platforms such as Field Programmable Gate Arrays (FPGAs), central processing units (CPUs), Graphical Processing Units (GPUs), and Application Specific Integrated Circuits (ASICs) used to implement A/DAS applications are also compared concerning their performance and resource utilization. We have examined the hardware and software development environments used in implementing A/DAS applications and reported their advantages and disadvantages. We provided performance comparisons of usual A/DAS tasks such as traffic sign recognition, road and lane detection, vehicle and pedestrian detection, driver behavior, and multiple tasking. Considering the current research dynamics, A/DAS will remain one of the most popular application fields for vehicular transportation shortly.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142239867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proactive deadlock prevention based on traffic classification sub-graphs for triplet-based NoC TriBA-cNoC 基于流量分类子图的主动死锁预防,适用于基于三胞胎的 NoC TriBA-cNoC
IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2024-08-31 DOI: 10.1016/j.micpro.2024.105091

Network topology and routing algorithms stand as pivotal decision points that profoundly impact the performance of Network-on-Chip (NoC) systems. As core counts rise, so does the inherent competition for shared resources, spotlighting the critical need for meticulously designed routing algorithms that circumvent deadlocks to ensure optimal network efficiency. This research capitalizes on the Triplet-Base Architecture (TriBA) and its Distributed Minimal Routing Algorithm (DM4T) to overcome the limitations of previous approaches. While DM4T exhibits performance advantages over previous routing algorithms, its deterministic nature and potential for circular dependencies during routing can lead to deadlocks and congestion. Therefore, this work addresses these vulnerabilities while leveraging the performance benefits of TriBA and DM4T. This work introduces a novel approach that merges a proactive deadlock prevention mechanism with Intermediate Adjacent Shortest Path Routing (IASPR). This combination guarantees both deadlock-free and livelock-free routing, ensuring reliable communication within the network. The key to this integration lies in a flow model-based data transfer categorization technique. This technique prevents the formation of circular dependencies. Additionally, it reduces redundant distance calculations during the routing process. By addressing these challenges, the proposed approach achieves improvements in both routing latency and throughput. To rigorously assess the performance of TriBA network topologies under varying configurations, extensive simulations were undertaken. The investigation encompassed both TriBA networks comprising 9 nodes and those with 27 nodes, employing DM4T, IASPR routing algorithms, and the proactive deadlock prevention method. The gem5 simulator, operating under the Garnet 3.0 network model using a standalone protocol for synthetic traffic patterns, was utilized for simulations at high injection rates, spanning diverse synthetic traffic patterns and PARSEC benchmark suite applications. Simulations rigorously quantified the effectiveness of the proposed approach, revealing reductions in average latency 40.17% and 34.05% compared to the lookup table and DM4T, respectively. Additionally, there were notable increases in average throughput of 7.48% and 5.66%.

网络拓扑和路由算法是深刻影响片上网络 (NoC) 系统性能的关键决策点。随着内核数量的增加,对共享资源的固有竞争也在加剧,因此迫切需要精心设计的路由算法来规避死锁,以确保最佳的网络效率。本研究利用三重基础架构(TriBA)及其分布式最小路由算法(DM4T)克服了以往方法的局限性。虽然 DM4T 与之前的路由算法相比具有性能优势,但其确定性和路由过程中的潜在循环依赖性可能会导致死锁和拥塞。因此,本研究在利用 TriBA 和 DM4T 性能优势的同时,解决了这些漏洞。这项工作引入了一种新方法,将主动死锁预防机制与中间相邻最短路径路由(IASPR)相结合。这种组合保证了无死锁和无活锁路由,确保了网络内的可靠通信。这种整合的关键在于基于流模型的数据传输分类技术。这种技术可防止形成循环依赖关系。此外,它还能减少路由过程中多余的距离计算。通过应对这些挑战,所提出的方法实现了路由延迟和吞吐量的改善。为了严格评估 TriBA 网络拓扑在不同配置下的性能,我们进行了大量模拟。调查涵盖了由 9 个节点组成的 TriBA 网络和由 27 个节点组成的 TriBA 网络,采用了 DM4T、IASPR 路由算法和主动死锁预防方法。gem5 模拟器在 Garnet 3.0 网络模型下运行,使用合成流量模式的独立协议,以高注入率进行模拟,涵盖各种合成流量模式和 PARSEC 基准套件应用。模拟严格量化了建议方法的有效性,结果显示,与查找表和 DM4T 相比,平均延迟分别降低了 40.17% 和 34.05%。此外,平均吞吐量也显著提高了 7.48% 和 5.66%。
{"title":"Proactive deadlock prevention based on traffic classification sub-graphs for triplet-based NoC TriBA-cNoC","authors":"","doi":"10.1016/j.micpro.2024.105091","DOIUrl":"10.1016/j.micpro.2024.105091","url":null,"abstract":"<div><p>Network topology and routing algorithms stand as pivotal decision points that profoundly impact the performance of Network-on-Chip (NoC) systems. As core counts rise, so does the inherent competition for shared resources, spotlighting the critical need for meticulously designed routing algorithms that circumvent deadlocks to ensure optimal network efficiency. This research capitalizes on the Triplet-Base Architecture (TriBA) and its Distributed Minimal Routing Algorithm (DM4T) to overcome the limitations of previous approaches. While DM4T exhibits performance advantages over previous routing algorithms, its deterministic nature and potential for circular dependencies during routing can lead to deadlocks and congestion. Therefore, this work addresses these vulnerabilities while leveraging the performance benefits of TriBA and DM4T. This work introduces a novel approach that merges a proactive deadlock prevention mechanism with Intermediate Adjacent Shortest Path Routing (IASPR). This combination guarantees both deadlock-free and livelock-free routing, ensuring reliable communication within the network. The key to this integration lies in a flow model-based data transfer categorization technique. This technique prevents the formation of circular dependencies. Additionally, it reduces redundant distance calculations during the routing process. By addressing these challenges, the proposed approach achieves improvements in both routing latency and throughput. To rigorously assess the performance of TriBA network topologies under varying configurations, extensive simulations were undertaken. The investigation encompassed both TriBA networks comprising 9 nodes and those with 27 nodes, employing DM4T, IASPR routing algorithms, and the proactive deadlock prevention method. The gem5 simulator, operating under the Garnet 3.0 network model using a standalone protocol for synthetic traffic patterns, was utilized for simulations at high injection rates, spanning diverse synthetic traffic patterns and PARSEC benchmark suite applications. Simulations rigorously quantified the effectiveness of the proposed approach, revealing reductions in average latency 40.17% and 34.05% compared to the lookup table and DM4T, respectively. Additionally, there were notable increases in average throughput of 7.48% and 5.66%.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":null,"pages":null},"PeriodicalIF":1.9,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142149991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Microprocessors and Microsystems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1