首页 > 最新文献

IEEE Transactions on Computers最新文献

英文 中文
LAShards: Low-Overhead and Self-Adaptive MRC Construction for Non-Stack Algorithms 非堆栈算法的低开销和自适应MRC构建
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-21 DOI: 10.1109/TC.2025.3590811
Sanle Zhao;Yujuan Tan;Zhaoyang Zeng;Jing Yu;Zhuoxin Bai;Ao Ren;Xianzhang Chen;Duo Liu
Shared cache systems have become increasingly crucial, especially in cloud services, where the Miss Ratio Curve (MRC) is a widely used tool for evaluating cache performance. The MRC depicts the relationship between the cache miss ratio and cache size, indicating how cache performance trends with varying cache sizes. Recent advancements have enabled efficient MRC construction for stack replacement policies. For non-stack policies, miniature simulation downsizes the actual cache size and data stream through spatially hashed sampling, providing a general method for MRC construction. However, this approach still faces significant challenges. Firstly, constructing an MRC requires numerous mini-caches to obtain miss ratios, consuming significant cache resources, leading to tremendous memory and computing overhead. Secondly, it cannot adapt to the dynamic I/O workloads, resulting in less precise MRC. To address these issues, we propose LAShards, a low-overhead and self-adaptive MRC construction method for non-stack replacement policies. The key idea behind LAShards is to exploit the locality and burstiness in access patterns. It can statically reduce memory usage and dynamically adapt to workloads. Compared to previous works, LAShards can save up to $20boldsymbol{times}$ of memory resources, and increase throughput by up to $10boldsymbol{times}$.
共享缓存系统变得越来越重要,特别是在云服务中,Miss Ratio Curve (MRC)是一种广泛使用的评估缓存性能的工具。MRC描述了缓存缺失率和缓存大小之间的关系,表明缓存性能随缓存大小的变化趋势。最近的进展使得有效的MRC构建堆栈替换策略成为可能。对于非堆栈策略,微型模拟通过空间散列采样缩小了实际缓存大小和数据流,为MRC构建提供了通用方法。然而,这种方法仍然面临着重大挑战。首先,构建MRC需要大量的迷你缓存来获得缺失率,消耗大量的缓存资源,导致巨大的内存和计算开销。其次,它不能适应动态I/O工作负载,导致MRC不太精确。为了解决这些问题,我们提出了一种低开销、自适应的非堆栈替换策略MRC构建方法。lasards背后的关键思想是利用访问模式的局部性和突发性。它可以静态地减少内存使用并动态地适应工作负载。与以前的作品相比,lasards可以节省高达$20boldsymbol{times}$的内存资源,并将吞吐量提高高达$10boldsymbol{times}$。
{"title":"LAShards: Low-Overhead and Self-Adaptive MRC Construction for Non-Stack Algorithms","authors":"Sanle Zhao;Yujuan Tan;Zhaoyang Zeng;Jing Yu;Zhuoxin Bai;Ao Ren;Xianzhang Chen;Duo Liu","doi":"10.1109/TC.2025.3590811","DOIUrl":"https://doi.org/10.1109/TC.2025.3590811","url":null,"abstract":"Shared cache systems have become increasingly crucial, especially in cloud services, where the Miss Ratio Curve (MRC) is a widely used tool for evaluating cache performance. The MRC depicts the relationship between the cache miss ratio and cache size, indicating how cache performance trends with varying cache sizes. Recent advancements have enabled efficient MRC construction for stack replacement policies. For non-stack policies, miniature simulation downsizes the actual cache size and data stream through spatially hashed sampling, providing a general method for MRC construction. However, this approach still faces significant challenges. Firstly, constructing an MRC requires numerous mini-caches to obtain miss ratios, consuming significant cache resources, leading to tremendous memory and computing overhead. Secondly, it cannot adapt to the dynamic I/O workloads, resulting in less precise MRC. To address these issues, we propose LAShards, a low-overhead and self-adaptive MRC construction method for non-stack replacement policies. The key idea behind LAShards is to exploit the locality and burstiness in access patterns. It can statically reduce memory usage and dynamically adapt to workloads. Compared to previous works, LAShards can save up to <inline-formula><tex-math>$20boldsymbol{times}$</tex-math></inline-formula> of memory resources, and increase throughput by up to <inline-formula><tex-math>$10boldsymbol{times}$</tex-math></inline-formula>.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3490-3503"},"PeriodicalIF":3.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WOLF: Weight-Level OutLier and Fault Integration for Reliable LLM Deployment 可靠LLM部署的权重级离群值和故障集成
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-17 DOI: 10.1109/TC.2025.3587957
Chong Wang;Wanyi Fu;Jiangwei Zhang;Shiyao Li;Rui Hou;Jian Yang;Yu Wang
The rapid advancement of Transformer-based large language models (LLMs) is presenting significant challenges for their deployment, primarily due to their enormous parameter sizes and intermediate results, which create a bottleneck in memory capacity for effective inference. Compared to traditional DRAM, Non-Volatile Memory (NVM) technologies such as Resistive Random-Access Memory (RRAM) and Phase-Change Memory (PCM) offer higher integration density, making them promising alternatives. However, before NVM can be widely adopted, its reliability issues, particularly manufacturing defects and endurance faults, must be addressed. In response to the limited memory capacity and reliability challenges of deploying LLMs in NVM, we introduce a novel low-overhead weight-level map, named Wolf. Wolf not only integrates the addresses of faulty weights to support efficient fault tolerance but also includes the addresses of outlier weights in LLMs. This allows for tensor-wise segmented quantization of both outliers and regular weights, enabling lower-bitwidth quantization. The Wolf framework uses a Bloom Filter-based map to efficiently manage outliers and faults. By employing shared hashes for outliers and faults and specific hashes for faults, Wolf significantly reduces the area overhead. Building on Wolf, we propose a novel fault tolerance method that resolves the observed issue of clustering critical incorrect outliers and fully leverages the inherent resilience of LLMs to improve fault tolerance capabilities. As a result, Wolf achieves segment-wise INT4 quantization with enhanced accuracy. Moreover, Wolf can adeptly handle Bit Error Rates as high as $1 {boldsymbol{times}} 10^{-2}$ without compromising accuracy, in stark contrast to the state-of-the-art approach where accuracy declines by more than 20%.
基于transformer的大型语言模型(llm)的快速发展为其部署提出了重大挑战,主要是由于它们的巨大参数大小和中间结果,这在有效推理的内存容量中造成了瓶颈。与传统的DRAM相比,非易失性存储器(NVM)技术,如电阻性随机存取存储器(RRAM)和相变存储器(PCM)提供更高的集成密度,使其成为有希望的替代方案。然而,在NVM被广泛采用之前,其可靠性问题,特别是制造缺陷和耐久性故障,必须得到解决。为了应对在NVM中部署llm的有限内存容量和可靠性挑战,我们引入了一种新的低开销权重级映射,名为Wolf。Wolf不仅集成了故障权值的地址以支持有效的容错,而且还在llm中包含了离群权值的地址。这允许对异常值和常规权重进行张量分段量化,从而实现低比特宽量化。Wolf框架使用基于Bloom filter的映射来有效地管理异常值和故障。通过对异常值和错误使用共享散列,对错误使用特定散列,Wolf显著地减少了面积开销。在Wolf的基础上,我们提出了一种新的容错方法,该方法解决了观察到的关键错误离群点聚类问题,并充分利用llm的固有弹性来提高容错能力。因此,Wolf以更高的精度实现了分段INT4量化。此外,Wolf可以熟练地处理高达$1 {boldsymbol{times}} 10^{-2}$的误码率,而不会影响精度,与最先进的方法形成鲜明对比,其中精度下降超过20%。
{"title":"WOLF: Weight-Level OutLier and Fault Integration for Reliable LLM Deployment","authors":"Chong Wang;Wanyi Fu;Jiangwei Zhang;Shiyao Li;Rui Hou;Jian Yang;Yu Wang","doi":"10.1109/TC.2025.3587957","DOIUrl":"https://doi.org/10.1109/TC.2025.3587957","url":null,"abstract":"The rapid advancement of Transformer-based large language models (LLMs) is presenting significant challenges for their deployment, primarily due to their enormous parameter sizes and intermediate results, which create a bottleneck in memory capacity for effective inference. Compared to traditional DRAM, Non-Volatile Memory (NVM) technologies such as Resistive Random-Access Memory (RRAM) and Phase-Change Memory (PCM) offer higher integration density, making them promising alternatives. However, before NVM can be widely adopted, its reliability issues, particularly manufacturing defects and endurance faults, must be addressed. In response to the limited memory capacity and reliability challenges of deploying LLMs in NVM, we introduce a novel low-overhead weight-level map, named <small>Wolf</small>. <small>Wolf</small> not only integrates the addresses of faulty weights to support efficient fault tolerance but also includes the addresses of outlier weights in LLMs. This allows for tensor-wise segmented quantization of both outliers and regular weights, enabling lower-bitwidth quantization. The <small>Wolf</small> framework uses a Bloom Filter-based map to efficiently manage outliers and faults. By employing shared hashes for outliers and faults and specific hashes for faults, <small>Wolf</small> significantly reduces the area overhead. Building on <small>Wolf</small>, we propose a novel fault tolerance method that resolves the observed issue of clustering critical incorrect outliers and fully leverages the inherent resilience of LLMs to improve fault tolerance capabilities. As a result, <small>Wolf</small> achieves segment-wise INT4 quantization with enhanced accuracy. Moreover, <small>Wolf</small> can adeptly handle Bit Error Rates as high as <inline-formula><tex-math>$1 {boldsymbol{times}} 10^{-2}$</tex-math></inline-formula> without compromising accuracy, in stark contrast to the state-of-the-art approach where accuracy declines by more than 20%.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3390-3403"},"PeriodicalIF":3.8,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Case for Secure Miniservers Beyond the Edge 超越边缘的安全服务器案例
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-16 DOI: 10.1109/TC.2025.3589691
Salonik Resch;Hüsrev Cılasun;Zamshed I. Chowdhury;Masoud Zabihi;Yang Lv;Jian-Ping Wang;Sachin S. Sapatnekar;Ismail Akturk;Ulya R. Karpuzcu
Beyond edge devices can function off the power grid and without batteries, making them suitable for deployment in hard-to-reach environments. As the energy budget is extremely tight, energy-hungry long-distance communication required for offloading computation or reporting results to a server becomes a significant limitation. Based on the observation that the energy required for communication decreases with shorter distances, this paper makes a case for the deployment of secure beyond edge miniservers. These are strategically positioned, lightweight local servers designed to support beyond edge devices without compromising the privacy of sensitive information. We demonstrate that even for relatively small scale representative computations – which are more likely to fit into the tight power budget of a beyond edge device for local processing – deploying a beyond edge miniserver can lead to higher performance. To this end, we consider representative deployment scenarios of practical importance, including but not limited to agricultural systems or building structures, where beyond edge miniservers enable highly energy-efficient real-time data processing.
超边缘设备可以在没有电池的情况下脱离电网运行,这使得它们适合部署在难以到达的环境中。由于能源预算非常紧张,卸载计算或向服务器报告结果所需的高能耗远程通信成为一个重大限制。基于观察到通信所需的能量随着距离的缩短而减少,本文提出了一个部署安全的超边缘微型服务器的案例。这些是战略性定位的轻量级本地服务器,旨在支持边缘设备,而不会损害敏感信息的隐私。我们证明,即使是相对较小规模的代表性计算(更有可能适合用于本地处理的超边缘设备的紧张功率预算),部署超边缘微型服务器也可以带来更高的性能。为此,我们考虑了具有实际重要性的代表性部署场景,包括但不限于农业系统或建筑结构,其中超边缘微型服务器能够实现高能效的实时数据处理。
{"title":"The Case for Secure Miniservers Beyond the Edge","authors":"Salonik Resch;Hüsrev Cılasun;Zamshed I. Chowdhury;Masoud Zabihi;Yang Lv;Jian-Ping Wang;Sachin S. Sapatnekar;Ismail Akturk;Ulya R. Karpuzcu","doi":"10.1109/TC.2025.3589691","DOIUrl":"https://doi.org/10.1109/TC.2025.3589691","url":null,"abstract":"<italic>Beyond edge devices</i> can function off the power grid and without batteries, making them suitable for deployment in hard-to-reach environments. As the energy budget is extremely tight, energy-hungry long-distance communication required for offloading computation or reporting results to a server becomes a significant limitation. Based on the observation that the energy required for communication decreases with shorter distances, this paper makes a case for the deployment of <italic>secure beyond edge miniservers</i>. These are strategically positioned, lightweight local servers designed to support beyond edge devices without compromising the privacy of sensitive information. We demonstrate that even for relatively small scale representative computations – which are more likely to fit into the tight power budget of a beyond edge device for local processing – deploying a beyond edge miniserver can lead to higher performance. To this end, we consider representative deployment scenarios of practical importance, including but not limited to agricultural systems or building structures, where beyond edge miniservers enable highly energy-efficient real-time data processing.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3448-3461"},"PeriodicalIF":3.8,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Highly Reliable Multiplexing Scheme in Hypercube-Structured Hierarchical Networks 超立方体结构分层网络中的高可靠复用方案
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-16 DOI: 10.1109/TC.2025.3589732
Xuanli Liu;Zhenjiang Dong;Weibei Fan;Mengjie Lv;Xueli Sun;Jin Qi;Sun-Yuan Hsieh
The design and optimization of network topologies play a critical role in ensuring the performance and efficiency of high-performance computing (HPC) systems. Traditional topology designs often fall short in satisfying the stringent requirements of HPC environments, particularly with respect to fault tolerance, latency, and bandwidth. To address these limitations, we propose a novel class of hierarchical networks, termed Hypercube-Structured Hierarchical Networks (HHNs). This architecture generalizes and extends existing architectures such as half hypercube networks and complete cubic networks, while also introducing previously unexplored hierarchical designs. HHNs exhibit several advantages, particularly in high-performance computing. Most notably, their high connectivity enables efficient parallel data processing, and their hierarchical structure supports scalability to accommodate growing computational demands. Furthermore, we present a unicast routing strategy and a broadcast algorithm for HHNs. A fault-tolerant algorithm is also designed based on the construction of disjoint paths. Experimental evaluations demonstrate that HHNs consistently outperform mainstream architectures in critical performance metrics, including scalability, latency, and robustness to failures.
网络拓扑的设计和优化对高性能计算系统的性能和效率起着至关重要的作用。传统的拓扑设计往往不能满足高性能计算环境的严格要求,特别是在容错性、延迟和带宽方面。为了解决这些限制,我们提出了一类新的分层网络,称为超立方体结构分层网络(HHNs)。这种体系结构概括和扩展了现有的体系结构,如半超立方体网络和完全立方体网络,同时还引入了以前未探索过的分层设计。hhn具有几个优点,特别是在高性能计算方面。最值得注意的是,它们的高连接性支持高效的并行数据处理,它们的分层结构支持可伸缩性,以适应不断增长的计算需求。此外,我们还提出了hhn的单播路由策略和广播算法。基于不相交路径的构造,设计了一种容错算法。实验评估表明,hhn在关键性能指标上始终优于主流架构,包括可伸缩性、延迟和故障健壮性。
{"title":"A Highly Reliable Multiplexing Scheme in Hypercube-Structured Hierarchical Networks","authors":"Xuanli Liu;Zhenjiang Dong;Weibei Fan;Mengjie Lv;Xueli Sun;Jin Qi;Sun-Yuan Hsieh","doi":"10.1109/TC.2025.3589732","DOIUrl":"https://doi.org/10.1109/TC.2025.3589732","url":null,"abstract":"The design and optimization of network topologies play a critical role in ensuring the performance and efficiency of high-performance computing (HPC) systems. Traditional topology designs often fall short in satisfying the stringent requirements of HPC environments, particularly with respect to fault tolerance, latency, and bandwidth. To address these limitations, we propose a novel class of hierarchical networks, termed Hypercube-Structured Hierarchical Networks (HHNs). This architecture generalizes and extends existing architectures such as half hypercube networks and complete cubic networks, while also introducing previously unexplored hierarchical designs. HHNs exhibit several advantages, particularly in high-performance computing. Most notably, their high connectivity enables efficient parallel data processing, and their hierarchical structure supports scalability to accommodate growing computational demands. Furthermore, we present a unicast routing strategy and a broadcast algorithm for HHNs. A fault-tolerant algorithm is also designed based on the construction of disjoint paths. Experimental evaluations demonstrate that HHNs consistently outperform mainstream architectures in critical performance metrics, including scalability, latency, and robustness to failures.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3462-3475"},"PeriodicalIF":3.8,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Highly Scalable Network Architecture for Optical Data Centers 一种用于光数据中心的高可扩展网络架构
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-16 DOI: 10.1109/TC.2025.3589688
Weibei Fan;Yao Pan;Fu Xiao;Pinchang Zhang;Lei Han;Sun-Yuan Hsieh
Optical Data Center Networks (ODCNs) are high-performance interconnect architectures in parallel and distributed computing, providing higher bandwidth and lower power consumption. However, current optical DCNs struggle to achieve both high scalability and incremental scalability simultaneously. In this paper, we propose an extended Exchanged hyperCube, denoted by ExCube, which is a highly scalable network architecture for optical data centers. Firstly, we detail the address scheme and constructing method for ExCube, including exponential, linear, and composite scalability, which can adapt to different scalability requirements. ExCube boasts flexible scalability modes, including exponential, linear, and composite scalability, meeting diverse scalability requirements. In particular, the diameter of ExCube remains unchanged as its size increases linearly, indicating superior incremental scalability. Secondly, an efficient routing algorithm with linear time complexity is presented to determine the shortest path between any two different ToRs in ExCube. Additionally, we propose a per-flow scheduling algorithm based on the disjoint paths to enhance the performance of ExCube. The optical devices in ExCube are identical to those in existing optical DCNs, such as WaveCube and OSA, facilitating its construction. Experimental results demonstrate that ExCube outperforms WaveCube in terms of throughput and reduces data transmission time by 5%-35%. Further analysis reveals that ExCube maintains comparable performance to WaveCube across several critical metrics, including low diameter and link complexity. Compared with advanced networks, the overall cost-effectiveness and energy efficiency of ExCube have been reduced by 36.7% and 46.5%, respectively.
光数据中心网络(Optical Data Center network, ODCNs)是一种并行、分布式计算的高性能互联架构,具有更高的带宽和更低的功耗。然而,目前的光纤DCNs很难同时实现高可扩展性和增量可扩展性。在本文中,我们提出了一个扩展的交换超立方体,表示为ExCube,这是一个高度可扩展的光数据中心网络架构。首先,详细介绍了ExCube的地址方案和构建方法,包括指数可扩展性、线性可扩展性和复合可扩展性,以适应不同的可扩展性需求。ExCube具有灵活的扩展方式,包括指数扩展、线性扩展和复合扩展,可满足不同的扩展需求。特别是,当ExCube的大小线性增加时,它的直径保持不变,这表明它具有更好的增量可伸缩性。其次,提出了一种具有线性时间复杂度的高效路由算法,用于确定ExCube中任意两个不同tor之间的最短路径。此外,我们还提出了一种基于不相交路径的逐流调度算法,以提高ExCube的性能。ExCube中的光设备与现有光DCNs(如WaveCube、OSA)中的光设备完全相同,便于构建。实验结果表明,ExCube在吞吐量方面优于WaveCube,并将数据传输时间减少了5%-35%。进一步的分析表明,ExCube在几个关键指标上与WaveCube保持相当的性能,包括低直径和链路复杂性。与先进网络相比,ExCube的整体成本效益和能源效率分别降低了36.7%和46.5%。
{"title":"A Highly Scalable Network Architecture for Optical Data Centers","authors":"Weibei Fan;Yao Pan;Fu Xiao;Pinchang Zhang;Lei Han;Sun-Yuan Hsieh","doi":"10.1109/TC.2025.3589688","DOIUrl":"https://doi.org/10.1109/TC.2025.3589688","url":null,"abstract":"Optical Data Center Networks (ODCNs) are high-performance interconnect architectures in parallel and distributed computing, providing higher bandwidth and lower power consumption. However, current optical DCNs struggle to achieve both high scalability and incremental scalability simultaneously. In this paper, we propose an extended <italic>Ex</i>changed hyper<italic>Cube</i>, denoted by ExCube, which is a highly scalable network architecture for optical data centers. Firstly, we detail the address scheme and constructing method for ExCube, including exponential, linear, and composite scalability, which can adapt to different scalability requirements. ExCube boasts flexible scalability modes, including exponential, linear, and composite scalability, meeting diverse scalability requirements. In particular, the diameter of ExCube remains unchanged as its size increases linearly, indicating superior incremental scalability. Secondly, an efficient routing algorithm with linear time complexity is presented to determine the shortest path between any two different ToRs in ExCube. Additionally, we propose a per-flow scheduling algorithm based on the disjoint paths to enhance the performance of ExCube. The optical devices in ExCube are identical to those in existing optical DCNs, such as WaveCube and OSA, facilitating its construction. Experimental results demonstrate that ExCube outperforms WaveCube in terms of throughput and reduces data transmission time by 5%-35%. Further analysis reveals that ExCube maintains comparable performance to WaveCube across several critical metrics, including low diameter and link complexity. Compared with advanced networks, the overall cost-effectiveness and energy efficiency of ExCube have been reduced by 36.7% and 46.5%, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3433-3447"},"PeriodicalIF":3.8,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaptDQC: Adaptive Distributed Quantum Computing With Quantitative Performance Analysis AdaptDQC:自适应分布式量子计算与定量性能分析
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-14 DOI: 10.1109/TC.2025.3586027
Debin Xiang;Liqiang Lu;Siwei Tan;Xinghui Jia;Zhe Zhou;Guangyu Sun;Mingshuai Chen;Jianwei Yin
We present AdaptDQC, an adaptive compiler framework for optimizing distributed quantum computing (DQC) under diverse performance metrics and inter-chip communication (ICC) architectures. AdaptDQC leverages a novel spatial-temporal graph model to describe quantum circuits, model ICC architectures, and quantify critical performance metrics in DQC systems, yielding a systematic and adaptive approach to constructing circuit-partitioning and chip-mapping strategies that admit hybrid ICC architectures and are optimized against various objectives. Experimental results on a collection of benchmarks show that AdaptDQC outperforms state-of-the-art compiler frameworks: It reduces, on average, the communication cost by up to 35.4% and the latency by up to 38.4%.
我们提出了AdaptDQC,一个自适应编译器框架,用于在不同性能指标和芯片间通信(ICC)架构下优化分布式量子计算(DQC)。AdaptDQC利用一种新的时空图模型来描述量子电路,模拟ICC架构,并量化DQC系统中的关键性能指标,从而产生一种系统和自适应的方法来构建电路划分和芯片映射策略,这些策略允许混合ICC架构并针对各种目标进行优化。在一系列基准测试上的实验结果表明,AdaptDQC优于最先进的编译器框架:它平均将通信成本降低了35.4%,延迟降低了38.4%。
{"title":"AdaptDQC: Adaptive Distributed Quantum Computing With Quantitative Performance Analysis","authors":"Debin Xiang;Liqiang Lu;Siwei Tan;Xinghui Jia;Zhe Zhou;Guangyu Sun;Mingshuai Chen;Jianwei Yin","doi":"10.1109/TC.2025.3586027","DOIUrl":"https://doi.org/10.1109/TC.2025.3586027","url":null,"abstract":"We present AdaptDQC, an adaptive compiler framework for optimizing distributed quantum computing (DQC) under diverse performance metrics and inter-chip communication (ICC) architectures. AdaptDQC leverages a novel spatial-temporal graph model to describe quantum circuits, model ICC architectures, and quantify critical performance metrics in DQC systems, yielding a systematic and adaptive approach to constructing circuit-partitioning and chip-mapping strategies that admit hybrid ICC architectures and are optimized against various objectives. Experimental results on a collection of benchmarks show that AdaptDQC outperforms state-of-the-art compiler frameworks: It reduces, on average, the communication cost by up to 35.4% and the latency by up to 38.4%.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3277-3290"},"PeriodicalIF":3.8,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11080164","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GATe: Efficient Graph Attention Network Acceleration With Near-Memory Processing GATe:基于近记忆处理的高效图注意网络加速
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-11 DOI: 10.1109/TC.2025.3588317
Shiyan Yi;Yudi Qiu;Guohao Xu;Lingfei Lu;Xiaoyang Zeng;Yibo Fan
Graph Attention Network (GAT) has gained widespread adoption thanks to its exceptional performance in processing non-Euclidean graphs. The critical components of a GAT model involve aggregation and attention, which cause numerous main-memory access, occupying significant inference time. Recently, much research has proposed near-memory processing (NMP) architectures to accelerate aggregation. However, graph attention requires additional operations distinct from aggregation, making previous NMP architectures less suitable for supporting GAT, as they typically target aggregation-only workloads. In this paper, we propose GATe, a practical and efficient GAT accelerator with NMP architecture. To the best of our knowledge, this is the first time that accelerates both attention and aggregation computation on DIMM. We unify feature vector access to eliminate the two repetitive memory accesses to source nodes caused by the sequential phase-by-phase execution of attention and aggregation. Next, we refine the computation flow to reduce data dependencies in concatenation and softmax, which lowers on-chip memory usage and communication overhead. Additionally, we introduce a novel sharding method that enhances data reusability of high-degree nodes. Experiments show that GATe achieves substantial speedup of GAT attention and aggregation phases up to 6.77${boldsymboltimes}$ and 2.46${boldsymboltimes}$, with average to 3.69${boldsymboltimes}$ and 2.24${boldsymboltimes}$, respectively, compared to state-of-the-art NMP works GNNear and GraNDe.
图注意网络(GAT)由于其在处理非欧几里得图方面的优异性能而得到了广泛的应用。GAT模型的关键组件包括聚合和注意,它们会导致大量的主内存访问,占用大量的推理时间。最近,许多研究提出了近内存处理(NMP)架构来加速聚合。然而,图注意需要与聚合不同的额外操作,这使得以前的NMP体系结构不太适合支持GAT,因为它们通常只针对聚合工作负载。本文提出了一种实用高效的NMP结构GAT加速器GATe。据我们所知,这是第一次在DIMM上同时加速注意力和聚合计算。我们统一了特征向量访问,消除了由于注意和聚合的逐级顺序执行而导致的对源节点的两次重复内存访问。接下来,我们细化计算流,以减少连接和softmax中的数据依赖性,从而降低片上内存使用和通信开销。此外,我们还引入了一种新的分片方法,增强了高节点的数据可重用性。实验表明,与最先进的NMP作品GNNear和GraNDe相比,GATe实现了GAT注意力和聚合阶段的显著加速,分别达到6.77${boldsymboltimes}$和2.46${boldsymboltimes}$,平均分别达到3.69${boldsymboltimes}$和2.24${boldsymboltimes}$。
{"title":"GATe: Efficient Graph Attention Network Acceleration With Near-Memory Processing","authors":"Shiyan Yi;Yudi Qiu;Guohao Xu;Lingfei Lu;Xiaoyang Zeng;Yibo Fan","doi":"10.1109/TC.2025.3588317","DOIUrl":"https://doi.org/10.1109/TC.2025.3588317","url":null,"abstract":"Graph Attention Network (GAT) has gained widespread adoption thanks to its exceptional performance in processing non-Euclidean graphs. The critical components of a GAT model involve aggregation and attention, which cause numerous main-memory access, occupying significant inference time. Recently, much research has proposed near-memory processing (NMP) architectures to accelerate aggregation. However, graph attention requires additional operations distinct from aggregation, making previous NMP architectures less suitable for supporting GAT, as they typically target aggregation-only workloads. In this paper, we propose GATe, a practical and efficient <u>GAT</u> acc<u>e</u>lerator with NMP architecture. To the best of our knowledge, this is the first time that accelerates both attention and aggregation computation on DIMM. We unify feature vector access to eliminate the two repetitive memory accesses to source nodes caused by the sequential phase-by-phase execution of attention and aggregation. Next, we refine the computation flow to reduce data dependencies in concatenation and softmax, which lowers on-chip memory usage and communication overhead. Additionally, we introduce a novel sharding method that enhances data reusability of high-degree nodes. Experiments show that GATe achieves substantial speedup of GAT attention and aggregation phases up to 6.77<inline-formula><tex-math>${boldsymboltimes}$</tex-math></inline-formula> and 2.46<inline-formula><tex-math>${boldsymboltimes}$</tex-math></inline-formula>, with average to 3.69<inline-formula><tex-math>${boldsymboltimes}$</tex-math></inline-formula> and 2.24<inline-formula><tex-math>${boldsymboltimes}$</tex-math></inline-formula>, respectively, compared to state-of-the-art NMP works GNNear and GraNDe.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3419-3432"},"PeriodicalIF":3.8,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ML-PTA: A Two-Stage ML-Enhanced Framework for Accelerating Nonlinear DC Circuit Simulation With Pseudo-Transient Analysis ML-PTA:用伪瞬态分析加速非线性直流电路仿真的两阶段ml增强框架
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-10 DOI: 10.1109/TC.2025.3587470
Zhou Jin;Wenhao Li;Haojie Pei;Xiaru Zha;Yichao Dong;Xiang Jin;Xiao Wu;Dan Niu;Wei W. Xing
Direct current (DC) analysis lies at the heart of integrated circuit design in seeking DC operating points. Although pseudo-transient analysis (PTA) methods have been widely used in DC analysis in both industry and academia, their initial parameters and stepping strategy require expert knowledge and labor tuning to deliver efficient performance, which hinders their further applications. In this paper, we leverage the latest advancements in machine learning to deploy PTA with more efficient setups for different problems. More specifically, active learning, which automatically draws knowledge from other circuits, is used to provide suitable initial parameters for PTA solver, and then calibrate on-the-fly to further accelerate the simulation process using TD3-based reinforcement learning (RL). To expedite model convergence, we introduce dual agents and a public sampling buffer in our RL method to enhance sample utilization. To further improve the learning efficiency of the RL agent, we incorporate imitation learning to improve reward function and introduce supervised learning to provide a better dual-agent rotation strategy. We make the proposed algorithm a general out-of-the-box SPICE-like solver and assess it on a variety of circuits, demonstrating up to 3.10$boldsymboltimes$ reduction in NR iterations for the initial stage and 285.71$boldsymboltimes$ for the RL stage.
在寻找直流工作点时,直流分析是集成电路设计的核心。虽然伪瞬态分析(PTA)方法在工业界和学术界广泛应用于直流分析,但其初始参数和步进策略需要专家知识和人工调整才能提供有效的性能,这阻碍了其进一步应用。在本文中,我们利用机器学习的最新进展,为不同的问题部署更有效的PTA设置。更具体地说,主动学习可以自动从其他电路中获取知识,为PTA求解器提供合适的初始参数,然后使用基于td3的强化学习(RL)进行实时校准,进一步加速仿真过程。为了加速模型收敛,我们在RL方法中引入了双重代理和公共采样缓冲区,以提高样本利用率。为了进一步提高RL智能体的学习效率,我们结合模仿学习来改进奖励函数,并引入监督学习来提供更好的双智能体轮换策略。我们将提出的算法作为一种通用的开箱式spice求解器,并在各种电路上对其进行评估,结果表明,初始阶段的NR迭代减少了3.10$boldsymboltimes$, RL阶段的NR迭代减少了285.71$boldsymboltimes$。
{"title":"ML-PTA: A Two-Stage ML-Enhanced Framework for Accelerating Nonlinear DC Circuit Simulation With Pseudo-Transient Analysis","authors":"Zhou Jin;Wenhao Li;Haojie Pei;Xiaru Zha;Yichao Dong;Xiang Jin;Xiao Wu;Dan Niu;Wei W. Xing","doi":"10.1109/TC.2025.3587470","DOIUrl":"https://doi.org/10.1109/TC.2025.3587470","url":null,"abstract":"Direct current (DC) analysis lies at the heart of integrated circuit design in seeking DC operating points. Although pseudo-transient analysis (PTA) methods have been widely used in DC analysis in both industry and academia, their initial parameters and stepping strategy require expert knowledge and labor tuning to deliver efficient performance, which hinders their further applications. In this paper, we leverage the latest advancements in machine learning to deploy PTA with more efficient setups for different problems. More specifically, active learning, which automatically draws knowledge from other circuits, is used to provide suitable initial parameters for PTA solver, and then calibrate on-the-fly to further accelerate the simulation process using TD3-based reinforcement learning (RL). To expedite model convergence, we introduce dual agents and a public sampling buffer in our RL method to enhance sample utilization. To further improve the learning efficiency of the RL agent, we incorporate imitation learning to improve reward function and introduce supervised learning to provide a better dual-agent rotation strategy. We make the proposed algorithm a general out-of-the-box SPICE-like solver and assess it on a variety of circuits, demonstrating up to 3.10<inline-formula><tex-math>$boldsymboltimes$</tex-math></inline-formula> reduction in NR iterations for the initial stage and 285.71<inline-formula><tex-math>$boldsymboltimes$</tex-math></inline-formula> for the RL stage.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3319-3331"},"PeriodicalIF":3.8,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Synergistic Memory Optimisations: Precision Tuning in Heterogeneous Memory Hierarchies 协同内存优化:异构内存层次的精确调优
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-10 DOI: 10.1109/TC.2025.3586025
Gabriele Magnani;Daniele Cattaneo;Lev Denisov;Giuseppe Tagliavini;Giovanni Agosta;Stefano Cherubin
Balancing energy efficiency and high performance in embedded systems requires fine-tuning hardware and software components to co-optimize their interaction. In this work, we address the automated optimization of memory usage through a compiler toolchain that leverages DMA-aware precision tuning and mathematical function memorization. The proposed solution extends the llvm infrastructure, employing the taffo plugins for precision tuning, with the SeTHet extension for DMA-aware precision tuning and luTHet for automated, DMA-aware mathematical function memorization. We performed an experimental assessment on hero, a heterogeneous platform employing risc-v cores as a parallel accelerator. Our solution enables speedups ranging from $1.5boldsymbol{times}$ to $51.1boldsymbol{times}$ on AxBench benchmarks that employ trigonometrical functions and $4.23-48.4boldsymbol{times}$ on Polybench benchmarks over the baseline hero platform.
在嵌入式系统中平衡能源效率和高性能需要微调硬件和软件组件,以共同优化它们的交互。在这项工作中,我们通过一个编译器工具链解决了内存使用的自动优化,该工具链利用了dma感知的精确调优和数学函数记忆。提出的解决方案扩展了llvm基础架构,使用tffo插件进行精确调优,使用SeTHet扩展进行感知dma的精确调优,使用luTHet扩展进行感知dma的自动数学函数记忆。我们在使用risc-v内核作为并行加速器的异构平台hero上进行了实验评估。我们的解决方案使加速范围从$1.5boldsymbol{times}$到$51.1boldsymbol{times}$在AxBench基准上使用三角函数和$4.23-48.4boldsymbol{times}$在基准英雄平台上的Polybench基准。
{"title":"Synergistic Memory Optimisations: Precision Tuning in Heterogeneous Memory Hierarchies","authors":"Gabriele Magnani;Daniele Cattaneo;Lev Denisov;Giuseppe Tagliavini;Giovanni Agosta;Stefano Cherubin","doi":"10.1109/TC.2025.3586025","DOIUrl":"https://doi.org/10.1109/TC.2025.3586025","url":null,"abstract":"Balancing energy efficiency and high performance in embedded systems requires fine-tuning hardware and software components to co-optimize their interaction. In this work, we address the automated optimization of memory usage through a compiler toolchain that leverages DMA-aware precision tuning and mathematical function memorization. The proposed solution extends the <small>llvm</small> infrastructure, employing the <small>taffo</small> plugins for precision tuning, with the <small>SeTHet</small> extension for DMA-aware precision tuning and <small>luTHet</small> for automated, DMA-aware mathematical function memorization. We performed an experimental assessment on <small>hero</small>, a heterogeneous platform employing <small>risc-v</small> cores as a parallel accelerator. Our solution enables speedups ranging from <inline-formula><tex-math>$1.5boldsymbol{times}$</tex-math></inline-formula> to <inline-formula><tex-math>$51.1boldsymbol{times}$</tex-math></inline-formula> on AxBench benchmarks that employ trigonometrical functions and <inline-formula><tex-math>$4.23-48.4boldsymbol{times}$</tex-math></inline-formula> on Polybench benchmarks over the baseline <small>hero</small> platform.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 9","pages":"3168-3180"},"PeriodicalIF":3.8,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144831908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High-Efficiency Parallel Mechanism for Canonical Polyadic Decomposition on Heterogeneous Computing Platform 异构计算平台上标准多元分解的高效并行机制
IF 3.8 2区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2025-07-10 DOI: 10.1109/TC.2025.3587623
Xiaosong Peng;Laurence T. Yang;Xiaokang Wang;Debin Liu;Jie Li
Canonical Polyadic decomposition (CPD) obtains the low-rank approximation for high-order multidimensional tensors through the summation of a sequence of rank-one tensors, greatly reducing storage and computation overhead. It is increasingly being used in the lightweight design of artificial intelligence and big data processing. The existing CPD technology exhibits inherent limitations in simultaneously achieving high accuracy and high efficiency. In this paper, a heterogeneous computing method for CPD is proposed to optimize computing efficiency with guaranteed convergence accuracy. Specifically, a quasi-convex decomposition loss function is constructed and the extreme points of the Kruskal matrix rows have been solved. Further, the massively parallelized operators in the algorithm are extracted, a software-hardware integrated scheduling method is designed, and the deployment of CPD on heterogeneous computing platforms is achieved. Finally, the memory access strategy is optimized to improve memory access efficiency. We tested the algorithm on real-world and synthetic sparse tensor datasets, numerical experimental results show that compared with the state-of-the-art method, the proposed method has a higher convergence accuracy and computing efficiency. Compared to the standard CPD parallel library, the method achieves efficiency improvements of tens to hundreds of times while maintaining the same accuracy.
正则多元分解(CPD)通过对一阶张量序列求和得到高阶多维张量的低秩逼近,大大减少了存储和计算开销。它越来越多地应用于人工智能的轻量化设计和大数据处理。现有的CPD技术在同时实现高精度和高效率方面存在固有的局限性。为了在保证收敛精度的前提下优化计算效率,提出了一种异构计算方法。具体而言,构造了拟凸分解损失函数,求解了Kruskal矩阵行极值点。进一步提取算法中大规模并行化的运算符,设计软硬件集成调度方法,实现了CPD在异构计算平台上的部署。最后,对存储器访问策略进行优化,以提高存储器访问效率。在实际稀疏张量数据集和合成稀疏张量数据集上对算法进行了测试,数值实验结果表明,与现有方法相比,本文提出的算法具有更高的收敛精度和计算效率。与标准CPD并行库相比,该方法在保持相同精度的情况下,效率提高了数十倍至数百倍。
{"title":"A High-Efficiency Parallel Mechanism for Canonical Polyadic Decomposition on Heterogeneous Computing Platform","authors":"Xiaosong Peng;Laurence T. Yang;Xiaokang Wang;Debin Liu;Jie Li","doi":"10.1109/TC.2025.3587623","DOIUrl":"https://doi.org/10.1109/TC.2025.3587623","url":null,"abstract":"Canonical Polyadic decomposition (CPD) obtains the low-rank approximation for high-order multidimensional tensors through the summation of a sequence of rank-one tensors, greatly reducing storage and computation overhead. It is increasingly being used in the lightweight design of artificial intelligence and big data processing. The existing CPD technology exhibits inherent limitations in simultaneously achieving high accuracy and high efficiency. In this paper, a heterogeneous computing method for CPD is proposed to optimize computing efficiency with guaranteed convergence accuracy. Specifically, a quasi-convex decomposition loss function is constructed and the extreme points of the Kruskal matrix rows have been solved. Further, the massively parallelized operators in the algorithm are extracted, a software-hardware integrated scheduling method is designed, and the deployment of CPD on heterogeneous computing platforms is achieved. Finally, the memory access strategy is optimized to improve memory access efficiency. We tested the algorithm on real-world and synthetic sparse tensor datasets, numerical experimental results show that compared with the state-of-the-art method, the proposed method has a higher convergence accuracy and computing efficiency. Compared to the standard CPD parallel library, the method achieves efficiency improvements of tens to hundreds of times while maintaining the same accuracy.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 10","pages":"3377-3389"},"PeriodicalIF":3.8,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145061974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Computers
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1