首页 > 最新文献

IEEE Transactions on Computers最新文献

英文 中文
TPDA-DRAM: A Variation-Aware DRAM Improving System Performance via In-Situ Timing Margin Detection and Adaptive Mitigation TPDA-DRAM:一种可感知变化的DRAM,通过原位时间裕度检测和自适应缓解提高系统性能
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-03 DOI: 10.1109/TC.2025.3627945
Yuxuan Qin;Chuxiong Lin;Guoming Rao;Ling Yang;Weiguang Sheng;Weifeng He
DRAM latency remains a critical bottleneck in the performance of modern computing systems. However, the latency is excessively conservative due to the timing margins imposed by DRAM vendors to accommodate rare worst-case scenarios, such as weak cells and high temperatures. In this study, we introduce a temperature- and process-variation-aware timing detection and adaptation DRAM (TPDA-DRAM) architecture that dynamically mitigates timing margins at runtime. TPDA-DRAM leverages innovative in-situ cross-coupled detectors to monitor voltage differences between bitline pairs inside DRAM arrays, ensuring precise detection of timing margins. Additionally, the proposed detector inherently accelerates the precharge operation of DRAM, thereby reducing the precharge latency by up to 62.5%. Building upon this architecture, we propose two variation-aware timing adaptation schemes: 1) a process-variation-aware adaptation (PVA) scheme that accelerates access to weak cells, mitigating process-induced timing margins, and 2) a temperature-variation-aware adaptation (TVA) scheme that leverages temperature information and the restoration truncation technique to reduce DRAM latency, mitigating temperature-induced timing margins. Evaluations on an eight-core computing system show that TPDA-DRAM improves average performance by 21.8% and energy efficiency by 18.2%.
DRAM延迟仍然是现代计算系统性能的关键瓶颈。然而,由于DRAM供应商为适应罕见的最坏情况(如弱单元和高温)而施加的时间裕度,延迟过于保守。在这项研究中,我们引入了一种温度和工艺变化感知的时序检测和自适应DRAM (TPDA-DRAM)架构,该架构可以动态地降低运行时的时序余量。TPDA-DRAM利用创新的原位交叉耦合检测器来监测DRAM阵列内位线对之间的电压差异,确保精确检测时序余量。此外,所提出的检测器固有地加速了DRAM的预充电操作,从而将预充电延迟减少了62.5%。在此架构的基础上,我们提出了两种变化感知的时序适应方案:1)进程变化感知的时序适应(PVA)方案,它加速了对弱单元的访问,减轻了进程引起的时序裕度;2)温度变化感知的时序适应(TVA)方案,它利用温度信息和恢复截断技术来减少DRAM延迟,减轻了温度引起的时序裕度。对一个八核计算系统的评估表明,TPDA-DRAM平均性能提高了21.8%,能效提高了18.2%。
{"title":"TPDA-DRAM: A Variation-Aware DRAM Improving System Performance via In-Situ Timing Margin Detection and Adaptive Mitigation","authors":"Yuxuan Qin;Chuxiong Lin;Guoming Rao;Ling Yang;Weiguang Sheng;Weifeng He","doi":"10.1109/TC.2025.3627945","DOIUrl":"https://doi.org/10.1109/TC.2025.3627945","url":null,"abstract":"DRAM latency remains a critical bottleneck in the performance of modern computing systems. However, the latency is excessively conservative due to the timing margins imposed by DRAM vendors to accommodate rare worst-case scenarios, such as weak cells and high temperatures. In this study, we introduce a temperature- and process-variation-aware timing detection and adaptation DRAM (TPDA-DRAM) architecture that dynamically mitigates timing margins at runtime. TPDA-DRAM leverages innovative <italic>in-situ</i> cross-coupled detectors to monitor voltage differences between bitline pairs inside DRAM arrays, ensuring precise detection of timing margins. Additionally, the proposed detector inherently accelerates the precharge operation of DRAM, thereby reducing the precharge latency by up to 62.5%. Building upon this architecture, we propose two variation-aware timing adaptation schemes: 1) a process-variation-aware adaptation (PVA) scheme that accelerates access to weak cells, mitigating process-induced timing margins, and 2) a temperature-variation-aware adaptation (TVA) scheme that leverages temperature information and the restoration truncation technique to reduce DRAM latency, mitigating temperature-induced timing margins. Evaluations on an eight-core computing system show that TPDA-DRAM improves average performance by 21.8% and energy efficiency by 18.2%.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"75 1","pages":"350-364"},"PeriodicalIF":3.8,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-Based CIM Architectures 在基于sram的CIM架构上增强稀疏DNN工作负载建模和探索
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-03 DOI: 10.1109/TC.2025.3628114
Yingjie Qi;Jianlei Yang;Rubing Yang;Cenlin Duan;Xiaolin He;Ziyan He;Weitao Pan;Weisheng Zhao
Compute-in-memory (CIM) has emerged as a pivotal direction for accelerating workloads in the field of machine learning, such as Deep Neural Networks (DNNs). However, the effective exploitation of sparsity in CIM systems presents numerous challenges, due to the inherent limitations in their rigid array structures. Designing sparse DNN dataflows and developing efficient mapping strategies also become more complex when accounting for diverse sparsity patterns and the flexibility of a multi-macro CIM structure. Despite these complexities, there is still an absence of a unified systematic view and modeling approach for diverse sparse DNN workloads in CIM systems. In this paper, we propose CIMinus, a framework dedicated to cost modeling for sparse DNN workloads on CIM architectures. It provides an in-depth energy consumption analysis at the level of individual components and an assessment of the overall workload latency. We validate CIMinus against contemporary CIM architectures and demonstrate its applicability in two use-cases. These cases provide valuable insights into both the impact of sparsity patterns and the effectiveness of mapping strategies, bridging the gap between theoretical design and practical implementation.
内存计算(CIM)已成为加速机器学习领域(如深度神经网络(dnn))工作负载的关键方向。然而,由于CIM系统的刚性阵列结构的固有限制,有效地利用稀疏性提出了许多挑战。当考虑到不同的稀疏模式和多宏CIM结构的灵活性时,设计稀疏DNN数据流和开发有效的映射策略也变得更加复杂。尽管存在这些复杂性,但对于CIM系统中各种稀疏DNN工作负载,仍然缺乏统一的系统视图和建模方法。在本文中,我们提出了CIMinus,一个致力于在CIM架构上对稀疏DNN工作负载进行成本建模的框架。它在单个组件级别上提供深入的能耗分析,并评估总体工作负载延迟。我们针对当代CIM体系结构验证了CIMinus,并在两个用例中演示了它的适用性。这些案例为稀疏模式的影响和映射策略的有效性提供了有价值的见解,弥合了理论设计和实际实现之间的差距。
{"title":"CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-Based CIM Architectures","authors":"Yingjie Qi;Jianlei Yang;Rubing Yang;Cenlin Duan;Xiaolin He;Ziyan He;Weitao Pan;Weisheng Zhao","doi":"10.1109/TC.2025.3628114","DOIUrl":"https://doi.org/10.1109/TC.2025.3628114","url":null,"abstract":"Compute-in-memory (CIM) has emerged as a pivotal direction for accelerating workloads in the field of machine learning, such as Deep Neural Networks (DNNs). However, the effective exploitation of sparsity in CIM systems presents numerous challenges, due to the inherent limitations in their rigid array structures. Designing sparse DNN dataflows and developing efficient mapping strategies also become more complex when accounting for diverse sparsity patterns and the flexibility of a multi-macro CIM structure. Despite these complexities, there is still an absence of a unified systematic view and modeling approach for diverse sparse DNN workloads in CIM systems. In this paper, we propose CIMinus, a framework dedicated to cost modeling for sparse DNN workloads on CIM architectures. It provides an in-depth energy consumption analysis at the level of individual components and an assessment of the overall workload latency. We validate CIMinus against contemporary CIM architectures and demonstrate its applicability in two use-cases. These cases provide valuable insights into both the impact of sparsity patterns and the effectiveness of mapping strategies, bridging the gap between theoretical design and practical implementation.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"75 1","pages":"380-394"},"PeriodicalIF":3.8,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145712545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DCS3: A Dual-Layer Co-Aware Scheduler With Stealing Balance and Synchronized Priority in Virtualization Environments DCS3:虚拟化环境中具有窃取平衡和同步优先级的双层协同感知调度器
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-11-03 DOI: 10.1109/TC.2025.3628012
Chenglai Xiong;Junjie Wen;Guoqi Xie;Zhongjia Wang;Zhenli He;Shaowen Yao;Jianfeng Tan;Tianyu Zhou;Tiwei Bie;Yan Yan;Shoumeng Yan
Virtualization environments (e.g., containers and hypervisors) achieve isolation of multiple runtime entities but result in two mutually isolated guest and host layers. Such cross-layer isolation could cause high latency and low throughput of the system. Previous aware scheduling and double scheduling fail to achieve bidirectional coordination between the guest and host layers. To address this challenge, we develop DCS3, a Dual-layer Co-aware Scheduler that combines stealing balance and synchronized priority. Stealing balancing migrates tasks between virtual CPU (vCPU) queues for load balance based on the workloads of physical CPUs (pCPUs). Synchronized priority dynamically adjusts the thread priorities running on the pCPUs according to the current vCPU workloads. The vCPUs and pCPUs belong to the guest and host layers, respectively. Compared with aware scheduling, double scheduling, and DCS2 (i.e., DCS3 without synchronized priority), DCS3 has the following obvious advantages: 1) Requests Per Second (RPS) increases by up to 52%, 55%, and 2%, respectively; 2) request latency decreases by up to 72%, 71%, and 20%, respectively.
虚拟化环境(例如,容器和管理程序)实现了多个运行时实体的隔离,但导致两个相互隔离的来宾层和主机层。这种跨层隔离可能导致系统的高延迟和低吞吐量。以往的感知调度和双调度无法实现来宾层和主机层之间的双向协调。为了应对这一挑战,我们开发了DCS3,这是一个结合了窃取平衡和同步优先级的双层协同感知调度器。均衡窃取是指根据pcpu的负载情况,在vCPU队列之间迁移任务,实现负载均衡。同步优先级根据当前vCPU的工作负载动态调整pcpu上运行的线程优先级。vcpu和pcpu分别属于guest层和host层。与感知调度、双调度和DCS2(即没有同步优先级的DCS3)相比,DCS3具有以下明显优势:1)每秒请求数(Requests Per Second, RPS)分别提高了52%、55%和2%;2)请求延迟分别减少高达72%、71%和20%。
{"title":"DCS3: A Dual-Layer Co-Aware Scheduler With Stealing Balance and Synchronized Priority in Virtualization Environments","authors":"Chenglai Xiong;Junjie Wen;Guoqi Xie;Zhongjia Wang;Zhenli He;Shaowen Yao;Jianfeng Tan;Tianyu Zhou;Tiwei Bie;Yan Yan;Shoumeng Yan","doi":"10.1109/TC.2025.3628012","DOIUrl":"https://doi.org/10.1109/TC.2025.3628012","url":null,"abstract":"Virtualization environments (e.g., containers and hypervisors) achieve isolation of multiple runtime entities but result in two mutually isolated guest and host layers. Such cross-layer isolation could cause high latency and low throughput of the system. Previous aware scheduling and double scheduling fail to achieve bidirectional coordination between the guest and host layers. To address this challenge, we develop DCS3, a Dual-layer Co-aware Scheduler that combines stealing balance and synchronized priority. Stealing balancing migrates tasks between virtual CPU (vCPU) queues for load balance based on the workloads of physical CPUs (pCPUs). Synchronized priority dynamically adjusts the thread priorities running on the pCPUs according to the current vCPU workloads. The vCPUs and pCPUs belong to the guest and host layers, respectively. Compared with aware scheduling, double scheduling, and DCS2 (i.e., DCS3 without synchronized priority), DCS3 has the following obvious advantages: 1) Requests Per Second (RPS) increases by up to 52%, 55%, and 2%, respectively; 2) request latency decreases by up to 72%, 71%, and 20%, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"75 1","pages":"365-379"},"PeriodicalIF":3.8,"publicationDate":"2025-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145712544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ElasticEC: Achieving Fast and Elastic Redundancy Transitioning in Erasure-Coded Clusters ElasticEC:在Erasure-Coded集群中实现快速和弹性的冗余转换
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-09-29 DOI: 10.1109/TC.2025.3614839
Yuhui Cai;Guowen Gong;Zhirong Shen;Jiahui Yang;Jiwu Shu
Erasure coding has been extensively deployed in today’s commodity HPC systems against unexpected failures. To adapt to the varying access characteristics and reliability demands, storage clusters have to perform redundancy transitioning via tuning the coding parameters, which unfortunately gives rise to substantial transitioning traffic. We present $textsf{ElasticEC}$, a fast and elastic redundancy transitioning approach for erasure-coded clusters. $textsf{ElasticEC}$ first minimizes the transitioning traffic via proposing a relocation-aware stripe reorganization mechanism and a collecting-and-encoding algorithm. It further heuristically balances the transitioning traffic across nodes. We implement $textsf{ElasticEC}$ in Hadoop HDFS and conduct extensive experiments on a real-world cloud storage cluster, showing that $textsf{ElasticEC}$ can reduce 71.1-92.6% of the transitioning traffic and shorten 65.9-90.7% of the transitioning time.
擦除编码已广泛部署在当今的商用高性能计算系统中,以防止意外故障。为了适应不同的访问特性和可靠性需求,存储集群必须通过调整编码参数来进行冗余转换,这将导致大量的转换流量。我们提出$textsf{ElasticEC}$,一种用于擦除编码集群的快速和弹性冗余转换方法。$textsf{ElasticEC}$首先通过提出一个重定位感知的条带重组机制和一个收集和编码算法来最小化迁移流量。它进一步以启发式方式平衡节点间的转换流量。我们在Hadoop HDFS中实现了$textsf{ElasticEC}$,并在真实的云存储集群上进行了大量的实验,结果表明$textsf{ElasticEC}$可以减少71.1-92.6%的迁移流量,缩短65.9-90.7%的迁移时间。
{"title":"ElasticEC: Achieving Fast and Elastic Redundancy Transitioning in Erasure-Coded Clusters","authors":"Yuhui Cai;Guowen Gong;Zhirong Shen;Jiahui Yang;Jiwu Shu","doi":"10.1109/TC.2025.3614839","DOIUrl":"https://doi.org/10.1109/TC.2025.3614839","url":null,"abstract":"Erasure coding has been extensively deployed in today’s commodity HPC systems against unexpected failures. To adapt to the varying access characteristics and reliability demands, storage clusters have to perform redundancy transitioning via tuning the coding parameters, which unfortunately gives rise to substantial transitioning traffic. We present <inline-formula><tex-math>$textsf{ElasticEC}$</tex-math></inline-formula>, a fast and elastic redundancy transitioning approach for erasure-coded clusters. <inline-formula><tex-math>$textsf{ElasticEC}$</tex-math></inline-formula> first minimizes the transitioning traffic via proposing a relocation-aware stripe reorganization mechanism and a collecting-and-encoding algorithm. It further heuristically balances the transitioning traffic across nodes. We implement <inline-formula><tex-math>$textsf{ElasticEC}$</tex-math></inline-formula> in Hadoop HDFS and conduct extensive experiments on a real-world cloud storage cluster, showing that <inline-formula><tex-math>$textsf{ElasticEC}$</tex-math></inline-formula> can reduce 71.1-92.6% of the transitioning traffic and shorten 65.9-90.7% of the transitioning time.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 12","pages":"4168-4181"},"PeriodicalIF":3.8,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight Graph Partitioning Enhanced by Implicit Knowledge 基于隐式知识的轻量级图划分
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-09-24 DOI: 10.1109/TC.2025.3612730
Zhigang Wang;Gongtai Sun;Ning Wang;Lixin Gao;Chuanfei Xu;Yu Gu;Ge Yu;Zhihong Tian
Graph partitioning as a classic NP-complete problem, is the most fundamental procedure that needs to be performed before parallel computations. Partitioners can be divided into vertex- and edge-based approaches. Recently, both approaches are employing a streaming heuristic to find approximate solutions. It is lightweight in space and time complexities, but suffers from suboptimal partitioning quality, especially for directed graphs where the explicit knowledge provided for heuristic is limited. This paper thereby proposes new heuristics for not only vertex-based but also edge-based partitioning. They improve quality by additionally utilizing implicit knowledge, which is embedded in the local streaming view and the global graph view. Memory reduction techniques are presented to extract this knowledge with negligible space costs. That preserves the lightweight advantages of streaming partitioning. Besides, we study parallel acceleration and restreaming, to further boost the partitioning efficiency and quality. Extensive experiments validate that our proposals outperform the state-of-the-art competitors.
图划分作为一个经典的np完全问题,是并行计算之前需要执行的最基本的程序。划分方法可分为基于顶点的方法和基于边的方法。最近,这两种方法都采用流启发式来寻找近似解。它在空间和时间复杂度上是轻量级的,但是分区质量不够理想,特别是对于为启发式提供的显式知识有限的有向图。因此,本文提出了新的启发式方法,不仅适用于基于顶点的分区,也适用于基于边缘的分区。它们通过额外利用嵌入在局部流视图和全局图视图中的隐式知识来提高质量。提出了内存缩减技术,以便以可忽略的空间成本提取这些知识。这保留了流分区的轻量级优势。此外,我们还研究了并行加速和重流,以进一步提高分区效率和质量。大量的实验证明我们的建议优于最先进的竞争对手。
{"title":"Lightweight Graph Partitioning Enhanced by Implicit Knowledge","authors":"Zhigang Wang;Gongtai Sun;Ning Wang;Lixin Gao;Chuanfei Xu;Yu Gu;Ge Yu;Zhihong Tian","doi":"10.1109/TC.2025.3612730","DOIUrl":"https://doi.org/10.1109/TC.2025.3612730","url":null,"abstract":"Graph partitioning as a classic NP-complete problem, is the most fundamental procedure that needs to be performed before parallel computations. Partitioners can be divided into vertex- and edge-based approaches. Recently, both approaches are employing a streaming heuristic to find approximate solutions. It is lightweight in space and time complexities, but suffers from suboptimal partitioning quality, especially for directed graphs where the explicit knowledge provided for heuristic is limited. This paper thereby proposes new heuristics for not only vertex-based but also edge-based partitioning. They improve quality by additionally utilizing implicit knowledge, which is embedded in the local streaming view and the global graph view. Memory reduction techniques are presented to extract this knowledge with negligible space costs. That preserves the lightweight advantages of streaming partitioning. Besides, we study parallel acceleration and restreaming, to further boost the partitioning efficiency and quality. Extensive experiments validate that our proposals outperform the state-of-the-art competitors.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 12","pages":"4153-4167"},"PeriodicalIF":3.8,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fluid Kernels: Seamlessly Conquering the Embedded Computing Continuum 流体内核:无缝地征服嵌入式计算连续体
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-09-19 DOI: 10.1109/TC.2025.3605745
Federico Terraneo;Daniele Cattaneo
To achieve seamless portability across the embedded computing continuum, we introduce a new kernel architecture: fluid kernels. Fluid kernels can be thought of as the intersection between embedded unikernels and general purpose monolithic kernels, allowing to seamlessly develop applications both in kernel space and user space in a unified way. This scalable kernel architecture can manage the trade-off between performance, code size, isolation and security. We compare our fluid kernel implementation, Miosix, to Linux and FreeRTOS on the same hardware with standard benchmarks. Compared to Linux, we achieve an average speedup of 3.5${boldsymbol{times}}$ and a maximum of up to 15.4${boldsymbol{times}}$. We also achieve an average code size reduction of 84% and a maximum of up to 90%. By moving application code from user space to kernel space, an additional code size reduction up to 56% and a speedup up to 1.3${boldsymbol{times}}$ can be achieved. Compared to FreeRTOS, the use of Miosix only costs a moderate amount of code size (at most 47KB) for significant advantages in application performance with speedups averaging at 1.5${boldsymbol{times}}$ and up to 5${boldsymbol{times}}$.
为了实现跨嵌入式计算连续体的无缝可移植性,我们引入了一种新的内核架构:流体内核。流动内核可以被认为是嵌入式内核和通用单片内核之间的交集,允许以统一的方式在内核空间和用户空间无缝地开发应用程序。这种可扩展的内核架构可以在性能、代码大小、隔离性和安全性之间进行权衡。我们将流畅的内核实现Miosix与Linux和FreeRTOS在相同硬件上的标准基准测试进行比较。与Linux相比,我们实现了3.5${boldsymbol{times}}$的平均加速,最高可达15.4${boldsymbol{times}}$。我们还实现了平均代码大小减少84%,最多减少90%。通过将应用程序代码从用户空间移动到内核空间,可以实现高达56%的额外代码大小减少和高达1.3${boldsymbol{times}}$的加速。与FreeRTOS相比,使用Miosix只需要花费适度的代码大小(最多47KB),就可以在应用程序性能方面获得显著优势,平均加速速度为1.5${boldsymbol{times}}$,最高可达5${boldsymbol{times}}$。
{"title":"Fluid Kernels: Seamlessly Conquering the Embedded Computing Continuum","authors":"Federico Terraneo;Daniele Cattaneo","doi":"10.1109/TC.2025.3605745","DOIUrl":"https://doi.org/10.1109/TC.2025.3605745","url":null,"abstract":"To achieve seamless portability across the embedded computing continuum, we introduce a new kernel architecture: fluid kernels. Fluid kernels can be thought of as the intersection between embedded unikernels and general purpose monolithic kernels, allowing to seamlessly develop applications both in kernel space and user space in a unified way. This scalable kernel architecture can manage the trade-off between performance, code size, isolation and security. We compare our fluid kernel implementation, Miosix, to Linux and FreeRTOS on the same hardware with standard benchmarks. Compared to Linux, we achieve an average speedup of 3.5<inline-formula><tex-math>${boldsymbol{times}}$</tex-math></inline-formula> and a maximum of up to 15.4<inline-formula><tex-math>${boldsymbol{times}}$</tex-math></inline-formula>. We also achieve an average code size reduction of 84% and a maximum of up to 90%. By moving application code from user space to kernel space, an additional code size reduction up to 56% and a speedup up to 1.3<inline-formula><tex-math>${boldsymbol{times}}$</tex-math></inline-formula> can be achieved. Compared to FreeRTOS, the use of Miosix only costs a moderate amount of code size (at most 47KB) for significant advantages in application performance with speedups averaging at 1.5<inline-formula><tex-math>${boldsymbol{times}}$</tex-math></inline-formula> and up to 5<inline-formula><tex-math>${boldsymbol{times}}$</tex-math></inline-formula>.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 12","pages":"4050-4064"},"PeriodicalIF":3.8,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11173649","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practical Signature-Free Multivalued Validated Byzantine Agreement and Asynchronous Common Subset in Constant Time 实用无签名多值验证拜占庭协议及恒时间异步公共子集
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-09-09 DOI: 10.1109/TC.2025.3607476
Xin Wang;Xiao Sui;Sisi Duan;Haibin Zhang
Asynchronous common subset (ACS) is a powerful paradigm enabling applications such as Byzantine fault-tolerance (BFT) and multi-party computation (MPC). The most efficient ACS framework in the information-theoretic setting is due to Ben-Or, Kelmer, and Rabin (BKR, 1994). The BKR ACS protocol has been both theoretically and practically impactful. BKR ACS has an $O(log n)$ running time (where $n$ is the number of replicas) due to the usage of $n$ parallel asynchronous binary agreement (ABA) instances, impacting both performance and scalability. Indeed, for a network of 16$sim$ 64 replicas, the parallel ABA phase occupies about 95%$sim$ 97% of the total runtime. A long-standing open problem is whether we can build an ACS framework with $O(1)$ time while not increasing the message or communication complexity of the BKR protocol. We resolve the open problem, presenting the first constant-time ACS protocol with $O(n^{3})$ messages in the information-theoretic and signature-free settings. Our key ingredient is the first information-theoretic and constant-time multivalued validated Byzantine agreement (MVBA) protocol. Our results can improve—asymptotically and concretely—various applications using ACS and MVBA. As an example, we implement FIN, a BFT protocol instantiated using our framework. Via a 121-server deployment on Amazon EC2, we show FIN reduces the overhead of the ABA phase to as low as 1.23% of the total runtime.
异步公共子集(ACS)是一种强大的范例,支持拜占庭容错(BFT)和多方计算(MPC)等应用程序。本-奥尔、凯尔默和拉宾(BKR, 1994)提出了信息理论背景下最有效的ACS框架。BKR ACS协议在理论和实践上都具有重要影响。由于使用$n$并行异步二进制协议(ABA)实例,BKR ACS的运行时间为$O(log n)$(其中$n$是副本的数量),这会影响性能和可伸缩性。事实上,对于一个16 $sim$ 64个副本的网络,平行ABA阶段占据了大约95个%$sim$ 97% of the total runtime. A long-standing open problem is whether we can build an ACS framework with $O(1)$ time while not increasing the message or communication complexity of the BKR protocol. We resolve the open problem, presenting the first constant-time ACS protocol with $O(n^{3})$ messages in the information-theoretic and signature-free settings. Our key ingredient is the first information-theoretic and constant-time multivalued validated Byzantine agreement (MVBA) protocol. Our results can improve—asymptotically and concretely—various applications using ACS and MVBA. As an example, we implement FIN, a BFT protocol instantiated using our framework. Via a 121-server deployment on Amazon EC2, we show FIN reduces the overhead of the ABA phase to as low as 1.23% of the total runtime.
{"title":"Practical Signature-Free Multivalued Validated Byzantine Agreement and Asynchronous Common Subset in Constant Time","authors":"Xin Wang;Xiao Sui;Sisi Duan;Haibin Zhang","doi":"10.1109/TC.2025.3607476","DOIUrl":"https://doi.org/10.1109/TC.2025.3607476","url":null,"abstract":"Asynchronous common subset (ACS) is a powerful paradigm enabling applications such as Byzantine fault-tolerance (BFT) and multi-party computation (MPC). The most efficient ACS framework in the information-theoretic setting is due to Ben-Or, Kelmer, and Rabin (BKR, 1994). The BKR ACS protocol has been both theoretically and practically impactful. BKR ACS has an <inline-formula><tex-math>$O(log n)$</tex-math></inline-formula> running time (where <inline-formula><tex-math>$n$</tex-math></inline-formula> is the number of replicas) due to the usage of <inline-formula><tex-math>$n$</tex-math></inline-formula> parallel asynchronous binary agreement (ABA) instances, impacting both performance and scalability. Indeed, for a network of 16<inline-formula><tex-math>$sim$</tex-math></inline-formula> 64 replicas, the parallel ABA phase occupies about 95%<inline-formula><tex-math>$sim$</tex-math></inline-formula> 97% of the total runtime. A long-standing open problem is whether we can build an ACS framework with <inline-formula><tex-math>$O(1)$</tex-math></inline-formula> time while not increasing the message or communication complexity of the BKR protocol. We resolve the open problem, presenting the first constant-time ACS protocol with <inline-formula><tex-math>$O(n^{3})$</tex-math></inline-formula> messages in the information-theoretic and signature-free settings. Our key ingredient is the first information-theoretic and constant-time multivalued validated Byzantine agreement (MVBA) protocol. Our results can improve—asymptotically and concretely—various applications using ACS and MVBA. As an example, we implement FIN, a BFT protocol instantiated using our framework. Via a 121-server deployment on Amazon EC2, we show FIN reduces the overhead of the ABA phase to as low as 1.23% of the total runtime.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 12","pages":"4138-4152"},"PeriodicalIF":3.8,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stochastic Modeling of Intrusion Tolerant Systems Based on Redundancy and Diversity 基于冗余和多样性的入侵容忍系统随机建模
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-09-05 DOI: 10.1109/TC.2025.3606189
Silvano Chiaradonna;Felicita Di Giandomenico;Giulio Masetti
To cope with unforeseen attacks to software systems in critical application domains, redundancy-based ITSs schemes are among popular countermeasures to deploy. Designing the adequate ITS for the stated security requirements calls for stochastic analysis supports, able to assess the impact of variety of attack patterns on different ITS configurations. As contribution to this purpose, a stochastic model for ITS is proposed, whose novel aspects are the ability to account for both camouflaging components and for correlation aspects between the security failures affecting the diverse implementations of the software cyber protections adopted in the ITS. Extensive analyses are conducted to show the applicability of the model; the obtained results allow to understand the limits and strengths of selected ITS configurations when subject to attacks occurring in unfavorable conditions for the defender.
为了应对关键应用领域中对软件系统的不可预见的攻击,基于冗余的it系统方案是常用的对策之一。为所述的安全需求设计适当的ITS需要随机分析支持,能够评估各种攻击模式对不同ITS配置的影响。为了实现这一目标,提出了ITS的随机模型,其新颖方面是能够考虑伪装组件和影响ITS中采用的软件网络保护的各种实现的安全故障之间的相关性方面。进行了广泛的分析,以证明该模型的适用性;所获得的结果允许理解所选ITS配置的限制和优势,当攻击发生在不利于防御者的条件下时。
{"title":"Stochastic Modeling of Intrusion Tolerant Systems Based on Redundancy and Diversity","authors":"Silvano Chiaradonna;Felicita Di Giandomenico;Giulio Masetti","doi":"10.1109/TC.2025.3606189","DOIUrl":"https://doi.org/10.1109/TC.2025.3606189","url":null,"abstract":"To cope with unforeseen attacks to software systems in critical application domains, redundancy-based ITSs schemes are among popular countermeasures to deploy. Designing the adequate ITS for the stated security requirements calls for stochastic analysis supports, able to assess the impact of variety of attack patterns on different ITS configurations. As contribution to this purpose, a stochastic model for ITS is proposed, whose novel aspects are the ability to account for both camouflaging components and for correlation aspects between the security failures affecting the diverse implementations of the software cyber protections adopted in the ITS. Extensive analyses are conducted to show the applicability of the model; the obtained results allow to understand the limits and strengths of selected ITS configurations when subject to attacks occurring in unfavorable conditions for the defender.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 12","pages":"4123-4137"},"PeriodicalIF":3.8,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DIVIDE: Efficient RowHammer Defense via In-DRAM Cache-Based Hot Data Isolation DIVIDE:通过基于dram缓存的热数据隔离的高效RowHammer防御
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-09-05 DOI: 10.1109/TC.2025.3603729
Haitao Du;Yuxuan Yang;Song Chen;Yi Kang
RowHammer poses a serious reliability challenge to modern DRAM systems. As technology scales down, DRAM resistance to RowHammer has decreased by $30times$ over the past decade, causing an increasing number of benign applications to suffer from this issue. However, existing defense mechanisms have three limitations: 1) they rely on inefficient mitigation techniques, such as time-consuming victim row refresh; 2) they do not reduce the number of effective RowHammer attacks, leading to frequent mitigations; and 3) they fail to recognize that frequently accessed data is not only a root cause of RowHammer but also presents an opportunity for performance optimization. In this paper, we observe that frequently accessed hot data plays a distinct role in security and efficiency: it can induce RowHammer by interfering with adjacent cold data, while also being performance-critical due to its frequent accesses. To this end, we propose Data Isolation via In-DRAM Cache (DIVIDE), a novel defense mechanism that leverages in-DRAM cache to isolate and exploit hot data. DIVIDE offers three key benefits: 1) It reduces the number of effective RowHammer attacks, as hot data in the cache cannot interfere with each other. 2) It provides a simple yet effective mitigation measure by isolating hot data from cold data. 3) It caches frequently accessed hot data, improving average access latency. DIVIDE employs a two-level protection structure: the first level mitigates RowHammer in cache arrays with high efficiency, while the second level addresses the remaining threats in normal arrays to ensure complete protection. Owing to the high in-DRAM cache hit rate, DIVIDE efficiently mitigates RowHammer while preserving both the performance and energy efficiency of the in-DRAM cache. At a RowHammer threshold of 128, DIVIDE with probabilistic mitigation achieves an average performance improvement of 19.6% and energy savings of 20.4% over DDR4 DRAM for four-core workloads. Compared to an unprotected in-DRAM cache DRAM, DIVIDE incurs only a 2.1% performance overhead while requiring just a modest 1KB per-channel CAM in the memory controller, with no modification to the DRAM chip.
RowHammer对现代DRAM系统的可靠性提出了严峻的挑战。随着技术规模的缩小,DRAM对RowHammer的抵抗力在过去十年中下降了30倍,导致越来越多的良性应用受到这个问题的困扰。然而,现有的防御机制有三个局限性:1)它们依赖于低效的缓解技术,例如耗时的受害者行刷新;2)它们不会减少有效的RowHammer攻击的数量,导致频繁的缓解;3)他们没有认识到频繁访问的数据不仅是RowHammer的根本原因,而且还提供了性能优化的机会。在本文中,我们观察到频繁访问的热数据在安全性和效率方面发挥着独特的作用:它可以通过干扰相邻的冷数据而诱发RowHammer,同时由于其频繁访问而对性能至关重要。为此,我们提出了通过dram内缓存进行数据隔离(DIVIDE),这是一种利用dram内缓存隔离和利用热数据的新型防御机制。DIVIDE提供了三个关键的好处:1)它减少了有效的RowHammer攻击的数量,因为缓存中的热数据不会相互干扰。2)通过隔离热数据和冷数据,它提供了一种简单而有效的缓解措施。3)缓存访问频繁的热数据,提高平均访问时延。DIVIDE采用两级保护结构,第一级保护高效缓解缓存阵列中的RowHammer,第二级保护普通阵列中的剩余威胁,确保完全保护。由于高内存缓存命中率,DIVIDE有效地减轻了RowHammer,同时保留了内存缓存的性能和能源效率。在RowHammer阈值为128时,与四核工作负载的DDR4 DRAM相比,带有概率缓解的DIVIDE实现了19.6%的平均性能提升和20.4%的能源节约。与不受保护的内置DRAM缓存DRAM相比,DIVIDE只会导致2.1%的性能开销,同时在内存控制器中只需要每个通道1KB的CAM,而不需要修改DRAM芯片。
{"title":"DIVIDE: Efficient RowHammer Defense via In-DRAM Cache-Based Hot Data Isolation","authors":"Haitao Du;Yuxuan Yang;Song Chen;Yi Kang","doi":"10.1109/TC.2025.3603729","DOIUrl":"https://doi.org/10.1109/TC.2025.3603729","url":null,"abstract":"RowHammer poses a serious reliability challenge to modern DRAM systems. As technology scales down, DRAM resistance to RowHammer has decreased by <inline-formula><tex-math>$30times$</tex-math></inline-formula> over the past decade, causing an increasing number of benign applications to suffer from this issue. However, existing defense mechanisms have three limitations: 1) they rely on inefficient mitigation techniques, such as time-consuming victim row refresh; 2) they do not reduce the number of effective RowHammer attacks, leading to frequent mitigations; and 3) they fail to recognize that frequently accessed data is not only a root cause of RowHammer but also presents an opportunity for performance optimization. In this paper, we observe that frequently accessed hot data plays a distinct role in security and efficiency: it can induce RowHammer by interfering with adjacent cold data, while also being performance-critical due to its frequent accesses. To this end, we propose Data Isolation via In-DRAM Cache (DIVIDE), a novel defense mechanism that leverages in-DRAM cache to isolate and exploit hot data. DIVIDE offers three key benefits: 1) It reduces the number of effective RowHammer attacks, as hot data in the cache cannot interfere with each other. 2) It provides a simple yet effective mitigation measure by isolating hot data from cold data. 3) It caches frequently accessed hot data, improving average access latency. DIVIDE employs a two-level protection structure: the first level mitigates RowHammer in cache arrays with high efficiency, while the second level addresses the remaining threats in normal arrays to ensure complete protection. Owing to the high in-DRAM cache hit rate, DIVIDE efficiently mitigates RowHammer while preserving both the performance and energy efficiency of the in-DRAM cache. At a RowHammer threshold of 128, DIVIDE with probabilistic mitigation achieves an average performance improvement of 19.6% and energy savings of 20.4% over DDR4 DRAM for four-core workloads. Compared to an unprotected in-DRAM cache DRAM, DIVIDE incurs only a 2.1% performance overhead while requiring just a modest 1KB per-channel CAM in the memory controller, with no modification to the DRAM chip.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 12","pages":"3980-3994"},"PeriodicalIF":3.8,"publicationDate":"2025-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OOLU: An Operation-Based Optimized Sparse LU Decomposition Accelerator for Circuit Simulation 面向电路仿真的基于运算的优化稀疏LU分解加速器
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-09-04 DOI: 10.1109/TC.2025.3605751
Ke Hu;Fan Yang
As scientific and engineering challenges grow in complexity and scale, the demand for effective solutions for sparse matrix computations becomes increasingly critical. LU decomposition, known for its ability to reduce computational load and enhance numerical stability, serves as a promising approach. This study focuses on accelerating sparse LU decomposition for circuit simulations, addressing the prolonged simulation times caused by large circuit matrices. We present a novel Operation-based Optimized LU (OOLU) decomposition architecture that significantly improves circuit analysis efficiency. OOLU employs a VLIW-like processing element array and incorporates a scheduler that decomposes computations into a fine-grained operational task flow graph, maximizing inter-operation parallelism. Specialized scheduling and data mapping strategies are applied to align with the adaptable pipelined framework and the characteristics of circuit matrices. The OOLU architecture is prototyped on an FPGA and validated through extensive tests on the University of Florida sparse matrix collection, benchmarked against multiple platforms. The accelerator achieves speedups ranging from $3.48times$ to $32.25times$ (average $12.51times$) over the KLU software package. It also delivers average speedups of $2.64times$ over a prior FPGA accelerator and $25.18times$ and $32.27times$ over the GPU accelerators STRUMPACK and SFLU, respectively, highlighting the substantial efficiency gains our approach delivers.
随着科学和工程挑战的复杂性和规模的增长,对稀疏矩阵计算的有效解决方案的需求变得越来越迫切。LU分解以其减少计算负荷和提高数值稳定性的能力而闻名,是一种很有前途的方法。本研究的重点是加速电路仿真中的稀疏LU分解,以解决大型电路矩阵导致的仿真时间延长的问题。我们提出了一种新的基于运算的优化电路分解架构,显著提高了电路分析效率。OOLU采用类似vliw的处理元素数组,并集成了一个调度器,该调度器将计算分解为细粒度的操作任务流图,从而最大限度地提高了操作间的并行性。采用了专门的调度和数据映射策略,以适应流水线框架和电路矩阵的特点。OOLU架构在FPGA上进行了原型设计,并在佛罗里达大学的稀疏矩阵集合上进行了广泛的测试,并针对多个平台进行了基准测试。与KLU软件包相比,该加速器的加速范围从3.48美元到32.25美元(平均为12.51美元)。与之前的FPGA加速器相比,它的平均速度提高了2.64倍,与GPU加速器STRUMPACK和SFLU相比,它的平均速度分别提高了25.18倍和32.27倍,凸显了我们的方法所带来的巨大效率提升。
{"title":"OOLU: An Operation-Based Optimized Sparse LU Decomposition Accelerator for Circuit Simulation","authors":"Ke Hu;Fan Yang","doi":"10.1109/TC.2025.3605751","DOIUrl":"https://doi.org/10.1109/TC.2025.3605751","url":null,"abstract":"As scientific and engineering challenges grow in complexity and scale, the demand for effective solutions for sparse matrix computations becomes increasingly critical. LU decomposition, known for its ability to reduce computational load and enhance numerical stability, serves as a promising approach. This study focuses on accelerating sparse LU decomposition for circuit simulations, addressing the prolonged simulation times caused by large circuit matrices. We present a novel Operation-based Optimized LU (OOLU) decomposition architecture that significantly improves circuit analysis efficiency. OOLU employs a VLIW-like processing element array and incorporates a scheduler that decomposes computations into a fine-grained operational task flow graph, maximizing inter-operation parallelism. Specialized scheduling and data mapping strategies are applied to align with the adaptable pipelined framework and the characteristics of circuit matrices. The OOLU architecture is prototyped on an FPGA and validated through extensive tests on the University of Florida sparse matrix collection, benchmarked against multiple platforms. The accelerator achieves speedups ranging from <inline-formula><tex-math>$3.48times$</tex-math></inline-formula> to <inline-formula><tex-math>$32.25times$</tex-math></inline-formula> (average <inline-formula><tex-math>$12.51times$</tex-math></inline-formula>) over the KLU software package. It also delivers average speedups of <inline-formula><tex-math>$2.64times$</tex-math></inline-formula> over a prior FPGA accelerator and <inline-formula><tex-math>$25.18times$</tex-math></inline-formula> and <inline-formula><tex-math>$32.27times$</tex-math></inline-formula> over the GPU accelerators STRUMPACK and SFLU, respectively, highlighting the substantial efficiency gains our approach delivers.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 12","pages":"4065-4079"},"PeriodicalIF":3.8,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Computers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1