首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
New Scheduling Algorithm and Analysis for Partitioned Periodic DAG Tasks on Multiprocessors 多处理机分区周期DAG任务调度新算法及分析
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-09-17 DOI: 10.1109/TPDS.2025.3611446
Haochun Liang;Xu Jiang;Junyi Liu;Xiantong Luo;Songran Liu;Nan Guan;Wang Yi
Real-time systems are increasingly shifting from single processors to multiprocessors, where software must be parallelized to fully exploit the additional computational power. While the scheduling of real-time parallel tasks modeled as directed acyclic graphs (DAGs) has been extensively studied in the context of global scheduling, the scheduling and analysis of real-time DAG tasks under partitioned scheduling remain far less developed compared to the traditional scheduling of sequential tasks. Existing approaches primarily target plain fixed-priority partitioned scheduling and often rely on self-suspension–based analysis, which limits opportunities for further optimization. In particular, such methods fail to fully leverage fine-grained scheduling management that could improve schedulability. In this paper, we propose a novel approach for scheduling periodic DAG tasks, in which each DAG task is transformed into a set of real-time transactions by incorporating mechanisms for enforcing release offsets and intra-task priority assignments. We further develop corresponding analysis techniques and partitioning algorithms. Through comprehensive experiments, we evaluate the real-time performance of the proposed methods against state-of-the-art scheduling and analysis techniques. The results demonstrate that our approach consistently outperforms existing methods for scheduling periodic DAG tasks across a wide range of parameter settings.
实时系统越来越多地从单处理器转向多处理器,其中软件必须并行化以充分利用额外的计算能力。在全局调度的背景下,以有向无环图(DAG)为模型的实时并行任务调度已经得到了广泛的研究,但与传统的顺序任务调度相比,实时DAG任务在分区调度下的调度和分析还远远不够。现有的方法主要针对固定优先级分区调度,并且通常依赖于基于自挂起的分析,这限制了进一步优化的机会。特别是,这些方法不能充分利用可以提高可调度性的细粒度调度管理。在本文中,我们提出了一种调度周期性DAG任务的新方法,该方法将每个DAG任务转换为一组实时事务,并结合强制释放偏移量和任务内优先级分配机制。我们进一步开发了相应的分析技术和划分算法。通过全面的实验,我们评估了所提出的方法与最先进的调度和分析技术的实时性。结果表明,我们的方法始终优于现有的方法,可以在广泛的参数设置范围内调度周期性DAG任务。
{"title":"New Scheduling Algorithm and Analysis for Partitioned Periodic DAG Tasks on Multiprocessors","authors":"Haochun Liang;Xu Jiang;Junyi Liu;Xiantong Luo;Songran Liu;Nan Guan;Wang Yi","doi":"10.1109/TPDS.2025.3611446","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3611446","url":null,"abstract":"Real-time systems are increasingly shifting from single processors to multiprocessors, where software must be parallelized to fully exploit the additional computational power. While the scheduling of real-time parallel tasks modeled as directed acyclic graphs (DAGs) has been extensively studied in the context of global scheduling, the scheduling and analysis of real-time DAG tasks under partitioned scheduling remain far less developed compared to the traditional scheduling of sequential tasks. Existing approaches primarily target plain fixed-priority partitioned scheduling and often rely on self-suspension–based analysis, which limits opportunities for further optimization. In particular, such methods fail to fully leverage fine-grained scheduling management that could improve schedulability. In this paper, we propose a novel approach for scheduling periodic DAG tasks, in which each DAG task is transformed into a set of real-time transactions by incorporating mechanisms for enforcing release offsets and intra-task priority assignments. We further develop corresponding analysis techniques and partitioning algorithms. Through comprehensive experiments, we evaluate the real-time performance of the proposed methods against state-of-the-art scheduling and analysis techniques. The results demonstrate that our approach consistently outperforms existing methods for scheduling periodic DAG tasks across a wide range of parameter settings.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2621-2634"},"PeriodicalIF":6.0,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HARMONIC: Uncertainty-Aware Multi-Objective Optimization for Energy-Efficient HPC Resource Management 谐波:节能高性能计算资源管理的不确定性感知多目标优化
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-09-16 DOI: 10.1109/TPDS.2025.3610354
Kyrian C. Adimora;Hongyang Sun
Exascale high-performance computing (HPC) systems face critical resource management challenges such as massive energy consumption in megawatts per facility, performance variability for identical jobs, and resource utilization inefficiencies. Traditional single-objective schedulers cannot address these multifaceted challenges effectively. This paper introduces HARMONIC (Holistic Adaptive Resource Management Optimizing Next-generation Interconnected Computing), a novel framework that simultaneously optimizes performance, energy efficiency, and resilience through uncertainty-aware multi-objective optimization. Our approach distinguishes aleatoric uncertainty (inherent system variability) from epistemic uncertainty (modeling limitations) using Bayesian neural networks and employs graph-based representations to capture complex system dependencies. Experimental validation in both simulated environments and controlled testbeds demonstrates significant improvements over state-of-the-art schedulers: 10–19% energy reduction, 16–25% throughput improvement and 18–32% performance variability reduction. These results translate to potential annual savings of multimillion dollars per exascale facility while enhancing scientific productivity through improved experimental reproducibility.
Exascale高性能计算(HPC)系统面临着关键的资源管理挑战,例如每个设施的大量能源消耗(以兆瓦计)、相同作业的性能变化以及资源利用效率低下。传统的单目标调度程序不能有效地解决这些多方面的挑战。本文介绍了谐波(整体自适应资源管理优化下一代互联计算),这是一个通过不确定性感知的多目标优化同时优化性能,能源效率和弹性的新框架。我们的方法使用贝叶斯神经网络区分任意不确定性(固有的系统可变性)和认知不确定性(建模限制),并采用基于图的表示来捕获复杂的系统依赖关系。在模拟环境和受控测试平台上的实验验证表明,与最先进的调度器相比,调度器有了显著的改进:能耗降低10-19%,吞吐量提高16-25%,性能可变性降低18-32%。这些结果转化为每个百亿亿次设备每年可能节省数百万美元,同时通过改进实验可重复性提高科学生产力。
{"title":"HARMONIC: Uncertainty-Aware Multi-Objective Optimization for Energy-Efficient HPC Resource Management","authors":"Kyrian C. Adimora;Hongyang Sun","doi":"10.1109/TPDS.2025.3610354","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3610354","url":null,"abstract":"Exascale high-performance computing (HPC) systems face critical resource management challenges such as massive energy consumption in megawatts per facility, performance variability for identical jobs, and resource utilization inefficiencies. Traditional single-objective schedulers cannot address these multifaceted challenges effectively. This paper introduces <italic>HARMONIC</i> (Holistic Adaptive Resource Management Optimizing Next-generation Interconnected Computing), a novel framework that simultaneously optimizes performance, energy efficiency, and resilience through uncertainty-aware multi-objective optimization. Our approach distinguishes aleatoric uncertainty (inherent system variability) from epistemic uncertainty (modeling limitations) using Bayesian neural networks and employs graph-based representations to capture complex system dependencies. Experimental validation in both simulated environments and controlled testbeds demonstrates significant improvements over state-of-the-art schedulers: 10–19% energy reduction, 16–25% throughput improvement and 18–32% performance variability reduction. These results translate to potential annual savings of multimillion dollars per exascale facility while enhancing scientific productivity through improved experimental reproducibility.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 11","pages":"2438-2450"},"PeriodicalIF":6.0,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145210111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedBiF: Communication-Efficient Federated Learning via Bits Freezing FedBiF:基于比特冻结的高效通信联邦学习
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-09-16 DOI: 10.1109/TPDS.2025.3610224
Shiwei Li;Qunwei Li;Haozhao Wang;Ruixuan Li;Jianbin Lin;Wenliang Zhong
Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative model training without sharing local data. Despite its advantages, FL suffers from substantial communication overhead, which can affect training efficiency. Recent efforts have mitigated this issue by quantizing model updates to reduce communication costs. However, most existing methods apply quantization only after local training, introducing quantization errors into the trained parameters and potentially degrading model accuracy. In this letter, we propose Federated Bit Freezing (FedBiF), a novel FL framework that directly learns quantized model parameters during local training. In each communication round, the server first quantizes the model parameters and transmits them to the clients. FedBiF then allows each client to update only a single bit of the multi-bit parameter representation, freezing the remaining bits. This bit-by-bit update strategy reduces each parameter update to one bit while maintaining high precision in parameter representation. Extensive experiments are conducted on five widely used datasets under both IID and Non-IID settings. The results demonstrate that FedBiF not only achieves superior communication compression but also promotes sparsity in the resulting models. Notably, FedBiF attains accuracy comparable to FedAvg, even when using only 1 bit-per-parameter (bpp) for uplink and 3 bpp for downlink communication.
联邦学习(FL)是一种新兴的分布式机器学习范式,可以在不共享本地数据的情况下进行协作模型训练。尽管它有很多优点,但它也有很大的通信开销,这可能会影响培训效率。最近的努力通过量化模型更新来减少通信成本,从而缓解了这个问题。然而,大多数现有方法只在局部训练后才进行量化,这将量化误差引入到训练参数中,并可能降低模型的精度。在这封信中,我们提出了联邦比特冻结(FedBiF),这是一种新的FL框架,可以在局部训练期间直接学习量化模型参数。在每一轮通信中,服务器首先量化模型参数并将其发送给客户端。然后,FedBiF允许每个客户端只更新多比特参数表示中的一个比特,而冻结其余的比特。这种逐位更新策略将每次参数更新减少到1位,同时保持参数表示的高精度。在IID和Non-IID设置下,对五个广泛使用的数据集进行了广泛的实验。结果表明,FedBiF不仅实现了良好的通信压缩,而且提高了模型的稀疏性。值得注意的是,即使上行链路仅使用1比特/参数(bpp),下行链路通信仅使用3比特/参数(bpp), FedBiF也能达到与fedag相当的精度。
{"title":"FedBiF: Communication-Efficient Federated Learning via Bits Freezing","authors":"Shiwei Li;Qunwei Li;Haozhao Wang;Ruixuan Li;Jianbin Lin;Wenliang Zhong","doi":"10.1109/TPDS.2025.3610224","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3610224","url":null,"abstract":"Federated learning (FL) is an emerging distributed machine learning paradigm that enables collaborative model training without sharing local data. Despite its advantages, FL suffers from substantial communication overhead, which can affect training efficiency. Recent efforts have mitigated this issue by quantizing model updates to reduce communication costs. However, most existing methods apply quantization only after local training, introducing quantization errors into the trained parameters and potentially degrading model accuracy. In this letter, we propose Federated Bit Freezing (FedBiF), a novel FL framework that directly learns quantized model parameters during local training. In each communication round, the server first quantizes the model parameters and transmits them to the clients. FedBiF then allows each client to update only a single bit of the multi-bit parameter representation, freezing the remaining bits. This bit-by-bit update strategy reduces each parameter update to one bit while maintaining high precision in parameter representation. Extensive experiments are conducted on five widely used datasets under both IID and Non-IID settings. The results demonstrate that FedBiF not only achieves superior communication compression but also promotes sparsity in the resulting models. Notably, FedBiF attains accuracy comparable to FedAvg, even when using only 1 bit-per-parameter (bpp) for uplink and 3 bpp for downlink communication.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2668-2678"},"PeriodicalIF":6.0,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIST: Towards MPI Instant Startup and Termination on Tianhe HPC Systems MIST:在天河高性能计算系统上实现MPI即时启动和终止
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-09-11 DOI: 10.1109/TPDS.2025.3608434
Yiqin Dai;Ruibo Wang;Yong Dong;Min Xie;Juan Chen;Wenzhe Zhang;Huijun Wu;Mingtian Shao;Kai Lu
As the size of MPI programs grows with expanding HPC resources and parallelism demands, the overhead of MPI startup and termination escalates due to the inclusion of less scalable global operations. Global operations involving extensive cross-machine communication and synchronization are crucial for ensuring semantic correctness. The current focus is on optimizing and accelerating these global operations rather than removing them, as the latter involves systematic changes to the system software stack and may impact program semantics. Given this background, we propose a systematic solution named MIST to safely eliminate global operations in MPI startup and termination. Through optimizing the generation of communication addresses, designing reliable communication protocols, and exploiting the resource release mechanism, MIST eliminates all global operations to achieve MPI instant startup and termination while ensuring correct program execution. Experiments on Tianhe-2 A supercomputer demonstrate that MIST can reduce the MPI_Init() time by 32.5-77.6% and the MPI_Finalize() time by 28.9-85.0%.
随着高性能计算资源和并行性需求的扩展,MPI程序的规模也在不断增长,MPI启动和终止的开销也在不断增加,因为包含的可扩展性较低的全局操作。涉及广泛的跨机器通信和同步的全局操作对于确保语义正确性至关重要。当前的重点是优化和加速这些全局操作,而不是删除它们,因为后者涉及系统软件堆栈的系统更改,并可能影响程序语义。在此背景下,我们提出了一个名为MIST的系统解决方案,以安全地消除MPI启动和终止中的全局操作。通过优化通信地址的生成,设计可靠的通信协议,利用资源释放机制,MIST消除了所有全局操作,在保证程序正确执行的同时,实现了MPI的即时启动和终止。在天河2a超级计算机上的实验表明,MIST可以使MPI_Init()时间缩短32.5 ~ 77.6%,MPI_Finalize()时间缩短28.9 ~ 85.0%。
{"title":"MIST: Towards MPI Instant Startup and Termination on Tianhe HPC Systems","authors":"Yiqin Dai;Ruibo Wang;Yong Dong;Min Xie;Juan Chen;Wenzhe Zhang;Huijun Wu;Mingtian Shao;Kai Lu","doi":"10.1109/TPDS.2025.3608434","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3608434","url":null,"abstract":"As the size of MPI programs grows with expanding HPC resources and parallelism demands, the overhead of MPI startup and termination escalates due to the inclusion of less scalable global operations. Global operations involving extensive cross-machine communication and synchronization are crucial for ensuring semantic correctness. The current focus is on optimizing and accelerating these global operations rather than removing them, as the latter involves systematic changes to the system software stack and may impact program semantics. Given this background, we propose a systematic solution named MIST to safely eliminate global operations in MPI startup and termination. Through optimizing the generation of communication addresses, designing reliable communication protocols, and exploiting the resource release mechanism, MIST eliminates all global operations to achieve MPI instant startup and termination while ensuring correct program execution. Experiments on Tianhe-2 A supercomputer demonstrate that MIST can reduce the <italic>MPI_Init()</i> time by 32.5-77.6% and the <italic>MPI_Finalize()</i> time by 28.9-85.0%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 11","pages":"2341-2353"},"PeriodicalIF":6.0,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XDGNN: Efficient Distributed GNN Training via Explanation-Guided Subgraph Expansion XDGNN:基于解释引导子图展开的高效分布式GNN训练
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-09-11 DOI: 10.1109/TPDS.2025.3609152
Jie Gao;Jia Hu;Geyong Min;Fei Hao
Graph neural network (GNN) is a state-of-the-art technique for learning structural information from graph data. However, training GNNs on large-scale graphs is very challenging due to the size of real-world graphs and the message-passing architecture of GNNs. One promising approach for scaling GNNs is distributed training across multiple accelerators, where each accelerator holds a partitioned subgraph that fits in memory to train the model in parallel. Existing distributed GNN training methods require frequent and prohibitive embedding exchanges between partitions, leading to substantial communication overhead and limited the training efficiency. To address this challenge, we propose XDGNN, a novel distributed GNN training method that eliminates the forward communication bottleneck and thus accelerates training. Specifically, we design an explanation-guided subgraph expansion technique that incorporates important structures identified by eXplanation AI (XAI) methods into local partitions, mitigating information loss caused by graph partitioning. Then, XDGNN conducts communication-free distributed training on these self-contained partitions through training the model in parallel without communicating node embeddings in the forward phase. Extensive experiments demonstrate that XDGNN significantly improves training efficiency while maintaining the model accuracy compared with current distributed GNN training methods.
图神经网络(GNN)是从图数据中学习结构信息的最新技术。然而,由于真实世界图的大小和gnn的消息传递架构,在大规模图上训练gnn是非常具有挑战性的。一种很有前途的扩展gnn的方法是跨多个加速器进行分布式训练,其中每个加速器都有一个适合内存的分区子图来并行训练模型。现有的分布式GNN训练方法需要在分区之间进行频繁且禁止的嵌入交换,导致大量的通信开销,限制了训练效率。为了解决这一挑战,我们提出了一种新的分布式GNN训练方法XDGNN,它消除了前向通信瓶颈,从而加快了训练速度。具体而言,我们设计了一种解释引导的子图扩展技术,该技术将解释人工智能(XAI)方法识别的重要结构纳入局部分区,从而减轻了图分区造成的信息丢失。然后,XDGNN通过并行训练模型,在前向阶段不通信节点嵌入,对这些自包含分区进行无通信的分布式训练。大量实验表明,与现有的分布式GNN训练方法相比,XDGNN在保持模型精度的同时显著提高了训练效率。
{"title":"XDGNN: Efficient Distributed GNN Training via Explanation-Guided Subgraph Expansion","authors":"Jie Gao;Jia Hu;Geyong Min;Fei Hao","doi":"10.1109/TPDS.2025.3609152","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3609152","url":null,"abstract":"Graph neural network (GNN) is a state-of-the-art technique for learning structural information from graph data. However, training GNNs on large-scale graphs is very challenging due to the size of real-world graphs and the message-passing architecture of GNNs. One promising approach for scaling GNNs is distributed training across multiple accelerators, where each accelerator holds a partitioned subgraph that fits in memory to train the model in parallel. Existing distributed GNN training methods require frequent and prohibitive embedding exchanges between partitions, leading to substantial communication overhead and limited the training efficiency. To address this challenge, we propose XDGNN, a novel distributed GNN training method that eliminates the forward communication bottleneck and thus accelerates training. Specifically, we design an explanation-guided subgraph expansion technique that incorporates important structures identified by eXplanation AI (XAI) methods into local partitions, mitigating information loss caused by graph partitioning. Then, XDGNN conducts communication-free distributed training on these self-contained partitions through training the model in parallel without communicating node embeddings in the forward phase. Extensive experiments demonstrate that XDGNN significantly improves training efficiency while maintaining the model accuracy compared with current distributed GNN training methods.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 11","pages":"2354-2365"},"PeriodicalIF":6.0,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ToT: Triangle Counting on Tensor Cores ToT:在张量核心上的三角形计数
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-09-08 DOI: 10.1109/TPDS.2025.3606878
YuAng Chen;Jeffrey Xu Yu
Triangle counting is a fundamental graph algorithm used to identify the number of triangles within a graph. This algorithm can be reformulated into linear algebraic operations, including sparse matrix multiplication, intersection and reduction. Modern GPUs, equipped with Tensor Cores, offer massive parallelism that can significantly accelerate graph algorithms. However, leveraging Tensor Cores, originally designed for dense matrix multiplication, to handle sparse workloads for triangle counting presents non-trivial challenges. In this paper, we conduct an in-depth analysis of the state-of-the-art techniques that utilizes Tensor Cores for matrix operations, identifying critical performance shortfalls. Based on these insights, we introduce ToT, which enhances the utilization of Tensor Cores and expands their functionalities for diverse sparse matrix operations. In experiments, ToT is evaluated against state-of-the-art methods. ToT outperform the second-fastest method with an 3.81× speedup in end-to-end execution. Also, it achieves up to 17.00× memory savings. This work represents a pioneering exploration into utilizing Tensor Cores for accelerating the triangle counting algorithm.
三角形计数是一种基本的图算法,用于识别图中三角形的数量。该算法可以重新表述为线性代数运算,包括稀疏矩阵的乘法、交和约简。配备Tensor Cores的现代gpu提供了大量并行性,可以显著加速图形算法。然而,利用最初为密集矩阵乘法设计的Tensor Cores来处理三角形计数的稀疏工作负载带来了不小的挑战。在本文中,我们对利用张量核心进行矩阵操作的最先进技术进行了深入分析,确定了关键的性能缺陷。基于这些见解,我们引入了ToT,它提高了张量核的利用率,并扩展了它们对各种稀疏矩阵运算的功能。在实验中,ToT是根据最先进的方法进行评估的。在端到端执行方面,ToT的速度比第二快的方法快3.81倍。此外,它可以节省17.00倍的内存。这项工作代表了利用张量核加速三角形计数算法的开创性探索。
{"title":"ToT: Triangle Counting on Tensor Cores","authors":"YuAng Chen;Jeffrey Xu Yu","doi":"10.1109/TPDS.2025.3606878","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3606878","url":null,"abstract":"Triangle counting is a fundamental graph algorithm used to identify the number of triangles within a graph. This algorithm can be reformulated into linear algebraic operations, including sparse matrix multiplication, intersection and reduction. Modern GPUs, equipped with Tensor Cores, offer massive parallelism that can significantly accelerate graph algorithms. However, leveraging Tensor Cores, originally designed for dense matrix multiplication, to handle sparse workloads for triangle counting presents non-trivial challenges. In this paper, we conduct an in-depth analysis of the state-of-the-art techniques that utilizes Tensor Cores for matrix operations, identifying critical performance shortfalls. Based on these insights, we introduce ToT, which enhances the utilization of Tensor Cores and expands their functionalities for diverse sparse matrix operations. In experiments, ToT is evaluated against state-of-the-art methods. ToT outperform the second-fastest method with an 3.81× speedup in end-to-end execution. Also, it achieves up to 17.00× memory savings. This work represents a pioneering exploration into utilizing Tensor Cores for accelerating the triangle counting algorithm.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2679-2692"},"PeriodicalIF":6.0,"publicationDate":"2025-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11153046","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145351921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Agent Collaboration for Workflow Task Offloading in End-Edge-Cloud Environments Using Deep Reinforcement Learning 基于深度强化学习的端缘云环境下工作流任务卸载的多智能体协作
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-09-04 DOI: 10.1109/TPDS.2025.3606001
Bohuai Xiao;Chujia Yu;Xing Chen;Zheyi Chen;Geyong Min
Computation offloading utilizes powerful cloud and edge resources to process workflow applications offloaded from Mobile Devices (MDs), effectively alleviating the resource constraints of MDs. In end-edge-cloud environments, workflow applications typically exhibit complex task dependencies. Meanwhile, parallel tasks from multi-MDs result in an expansive solution space for offloading decisions. Therefore, determining optimal offloading plans for highly dynamic and complex end-edge-cloud environments presents significant challenges. The existing studies on offloading tasks for multi-MD workflows often adopt centralized decision-making methods, which suffer from prolonged decision time, high computational overhead, and inability to identify suitable offloading plans in large-scale scenarios. To address these challenges, we propose a Multi-agent Collaborative method for Workflow Task offloading in end-edge-cloud environments with the Actor-Critic algorithm called MCWT-AC. First, each MD is modeled as an agent and independently makes offloading decisions based on local information. Next, each MD’s workflow task offloading decision model is obtained through the Actor-Critic algorithm. At runtime, an effective workflow task offloading plan can be gradually developed through multi-agent collaboration. Extensive simulation results demonstrate that the MCWT-AC exhibits superior adaptability and scalability. Moreover, the MCWT-AC outperforms the state-of-art methods and can quickly achieve optimal/near-optimal performance.
计算卸载利用强大的云和边缘资源来处理从移动设备上卸载的工作流应用,有效缓解了移动设备的资源约束。在终端云环境中,工作流应用程序通常表现出复杂的任务依赖关系。同时,来自多个mds的并行任务为卸载决策提供了广阔的解决方案空间。因此,为高度动态和复杂的端缘云环境确定最佳卸载计划是一项重大挑战。现有的多md工作流卸载任务研究多采用集中式决策方法,存在决策时间长、计算开销大、无法确定大规模场景下合适的卸载方案等问题。为了解决这些挑战,我们提出了一种多代理协作方法,用于在端边缘云环境中使用称为MCWT-AC的Actor-Critic算法卸载工作流任务。首先,将每个MD建模为一个代理,并根据本地信息独立地做出卸载决策。然后,通过Actor-Critic算法得到每个MD的工作流任务卸载决策模型。在运行时,可以通过多智能体协作逐步制定有效的工作流任务卸载计划。大量的仿真结果表明,MCWT-AC具有良好的适应性和可扩展性。此外,MCWT-AC优于最先进的方法,可以快速实现最佳/接近最佳的性能。
{"title":"Multi-Agent Collaboration for Workflow Task Offloading in End-Edge-Cloud Environments Using Deep Reinforcement Learning","authors":"Bohuai Xiao;Chujia Yu;Xing Chen;Zheyi Chen;Geyong Min","doi":"10.1109/TPDS.2025.3606001","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3606001","url":null,"abstract":"Computation offloading utilizes powerful cloud and edge resources to process workflow applications offloaded from Mobile Devices (MDs), effectively alleviating the resource constraints of MDs. In end-edge-cloud environments, workflow applications typically exhibit complex task dependencies. Meanwhile, parallel tasks from multi-MDs result in an expansive solution space for offloading decisions. Therefore, determining optimal offloading plans for highly dynamic and complex end-edge-cloud environments presents significant challenges. The existing studies on offloading tasks for multi-MD workflows often adopt centralized decision-making methods, which suffer from prolonged decision time, high computational overhead, and inability to identify suitable offloading plans in large-scale scenarios. To address these challenges, we propose a Multi-agent Collaborative method for Workflow Task offloading in end-edge-cloud environments with the Actor-Critic algorithm called MCWT-AC. First, each MD is modeled as an agent and independently makes offloading decisions based on local information. Next, each MD’s workflow task offloading decision model is obtained through the Actor-Critic algorithm. At runtime, an effective workflow task offloading plan can be gradually developed through multi-agent collaboration. Extensive simulation results demonstrate that the MCWT-AC exhibits superior adaptability and scalability. Moreover, the MCWT-AC outperforms the state-of-art methods and can quickly achieve optimal/near-optimal performance.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 11","pages":"2281-2296"},"PeriodicalIF":6.0,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Approximation Algorithms for Scheduling With/Without Deadline Constraints Where Rejection Costs are Proportional to Processing Times 拒绝成本与处理时间成正比的有/无期限约束调度的近似算法
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-09-03 DOI: 10.1109/TPDS.2025.3605674
Olivier Beaumont;Rémi Bouzel;Lionel Eyraud-Dubois;Esragul Korkmaz;Laércio Lima Pilla;Alexandre Van Kempen
We study two offline job scheduling problems where tasks can be processed on a limited number of energy-efficient edge machines or offloaded to an unlimited supply of energy-inefficient cloud machines (called rejected). The objective is to minimize total energy consumption. First, we consider scheduling without deadlines, formulating it as a scheduling problem with rejection, where rejection costs are proportional to processing times. We propose a novel $frac{5}{4}(1+epsilon )$-approximation algorithm, $mathcal {BEKP}$, by associating it to a Multiple Subset Sum problem, improving upon the existing $ (frac{3}{2} - frac{1}{2m})$-approximation for arbitrary rejection costs. Next, we address scheduling with deadlines, aiming to minimize the weighted number of rejected jobs. We position this problem within the literature and introduce a new $(1-frac{(m-1)^{m}}{m^{m}})$-approximation algorithm, $mathcal {MDP}$, inspired by an interval selection algorithm with a $(1-frac{m^{m}}{(m+1)^{m}})$-approximation for arbitrary rejection costs. Experimental results demonstrate that $mathcal {BEKP}$ and $mathcal {MDP}$ obtain better results (lower costs or higher profits) than other state-of-the-art algorithms while maintaining a competitive or better time complexity.
我们研究了两个离线作业调度问题,其中任务可以在有限数量的节能边缘机器上处理,也可以卸载到无限数量的节能低效云机器上(称为拒绝)。目标是尽量减少总能源消耗。首先,我们考虑没有截止日期的调度,将其表述为拒绝的调度问题,其中拒绝成本与处理时间成正比。我们提出了一种新的$frac{5}{4}(1+epsilon )$ -近似算法$mathcal {BEKP}$,通过将其与多子集和问题联系起来,改进了现有的针对任意拒绝成本的$ (frac{3}{2} - frac{1}{2m})$ -近似。接下来,我们将讨论有截止日期的调度,目的是最小化被拒绝工作的加权数量。我们在文献中定位了这个问题,并引入了一个新的$(1-frac{(m-1)^{m}}{m^{m}})$ -近似算法$mathcal {MDP}$,该算法的灵感来自于对任意拒绝成本具有$(1-frac{m^{m}}{(m+1)^{m}})$ -近似的区间选择算法。实验结果表明,$mathcal {BEKP}$和$mathcal {MDP}$在保持竞争性或更好的时间复杂度的同时,获得了比其他最先进的算法更好的结果(更低的成本或更高的利润)。
{"title":"Approximation Algorithms for Scheduling With/Without Deadline Constraints Where Rejection Costs are Proportional to Processing Times","authors":"Olivier Beaumont;Rémi Bouzel;Lionel Eyraud-Dubois;Esragul Korkmaz;Laércio Lima Pilla;Alexandre Van Kempen","doi":"10.1109/TPDS.2025.3605674","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3605674","url":null,"abstract":"We study two offline job scheduling problems where tasks can be processed on a limited number of energy-efficient edge machines or offloaded to an unlimited supply of energy-inefficient cloud machines (called rejected). The objective is to minimize total energy consumption. First, we consider scheduling without deadlines, formulating it as a scheduling problem with rejection, where rejection costs are proportional to processing times. We propose a novel <inline-formula><tex-math>$frac{5}{4}(1+epsilon )$</tex-math></inline-formula>-approximation algorithm, <inline-formula><tex-math>$mathcal {BEKP}$</tex-math></inline-formula>, by associating it to a Multiple Subset Sum problem, improving upon the existing <inline-formula><tex-math>$ (frac{3}{2} - frac{1}{2m})$</tex-math></inline-formula>-approximation for arbitrary rejection costs. Next, we address scheduling with deadlines, aiming to minimize the weighted number of rejected jobs. We position this problem within the literature and introduce a new <inline-formula><tex-math>$(1-frac{(m-1)^{m}}{m^{m}})$</tex-math></inline-formula>-approximation algorithm, <inline-formula><tex-math>$mathcal {MDP}$</tex-math></inline-formula>, inspired by an interval selection algorithm with a <inline-formula><tex-math>$(1-frac{m^{m}}{(m+1)^{m}})$</tex-math></inline-formula>-approximation for arbitrary rejection costs. Experimental results demonstrate that <inline-formula><tex-math>$mathcal {BEKP}$</tex-math></inline-formula> and <inline-formula><tex-math>$mathcal {MDP}$</tex-math></inline-formula> obtain better results (lower costs or higher profits) than other state-of-the-art algorithms while maintaining a competitive or better time complexity.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 12","pages":"2596-2608"},"PeriodicalIF":6.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145352196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DynPipe: Toward Dynamic End-to-End Pipeline Parallelism for Interference-Aware DNN Training DynPipe:面向干扰感知深度神经网络训练的动态端到端管道并行
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-09-03 DOI: 10.1109/TPDS.2025.3605491
Zhengyi Yuan;Xiong Wang;Yuntao Nie;Yufei Tao;Yuqing Li;Zhiyuan Shao;Xiaofei Liao;Bo Li;Hai Jin
Pipeline parallelism has emerged as an indispensable technique for training large deep neural networks. While existing asynchronous pipeline systems address the time bubbles inherent in synchronous architectures, they continue to suffer from inefficiency and susceptibility to volatile hardware environment due to their suboptimal and static configurations. In this article, we propose DynPipe, an interference-aware asynchronous pipeline framework to optimize the end-to-end training performance in highly dynamic computing environments. By characterizing the non-overlapped communication overheads and convergence rate conditioned on stage-wise staleness, DynPipe carefully crafts an optimized pipeline partition that harmonizes the hardware speed with statistical convergence. Moreover, DynPipe deploys a non-intrusive random forest model that utilizes runtime stage statistics to evaluate the impact of environmental changes, such as task interference and network jitter, on the training efficiency. Following the evaluation guidance, DynPipe adaptively adjusts partition plan to restore both intra and inter-stage load balancing, thereby facilitating seamless pipeline reconfiguration in dynamic environments. Extensive experiments show that DynPipe outperforms state-of-the-art systems, accelerating the time-to-accuracy by 1.5-3.4×.
管道并行已经成为训练大型深度神经网络不可或缺的技术。虽然现有的异步管道系统解决了同步体系结构中固有的时间气泡,但由于其次优和静态配置,它们仍然存在效率低下和易受硬件环境影响的问题。在本文中,我们提出了DynPipe,一个干扰感知异步管道框架,以优化高动态计算环境下的端到端训练性能。通过描述非重叠的通信开销和基于阶段成熟度的收敛速度,DynPipe精心设计了一个优化的管道分区,使硬件速度与统计收敛相协调。此外,DynPipe部署了非侵入式随机森林模型,该模型利用运行时阶段统计来评估任务干扰和网络抖动等环境变化对训练效率的影响。根据评估指导,DynPipe自适应调整分区计划,恢复级内和级间的负载平衡,从而实现动态环境下管道的无缝重构。大量实验表明,DynPipe优于最先进的系统,可将精确时间缩短1.5-3.4倍。
{"title":"DynPipe: Toward Dynamic End-to-End Pipeline Parallelism for Interference-Aware DNN Training","authors":"Zhengyi Yuan;Xiong Wang;Yuntao Nie;Yufei Tao;Yuqing Li;Zhiyuan Shao;Xiaofei Liao;Bo Li;Hai Jin","doi":"10.1109/TPDS.2025.3605491","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3605491","url":null,"abstract":"Pipeline parallelism has emerged as an indispensable technique for training large deep neural networks. While existing asynchronous pipeline systems address the time bubbles inherent in synchronous architectures, they continue to suffer from <italic>inefficiency</i> and <italic>susceptibility</i> to <italic>volatile</i> hardware environment due to their suboptimal and <italic>static</i> configurations. In this article, we propose DynPipe, an <italic>interference-aware</i> asynchronous pipeline framework to optimize the <italic>end-to-end</i> training performance in highly <italic>dynamic</i> computing environments. By characterizing the <italic>non-overlapped</i> communication overheads and <italic>convergence</i> rate conditioned on stage-wise staleness, DynPipe carefully crafts an optimized pipeline partition that harmonizes the hardware speed with statistical convergence. Moreover, DynPipe deploys a <italic>non-intrusive</i> random forest model that utilizes runtime stage statistics to evaluate the impact of environmental changes, such as task interference and network jitter, on the training efficiency. Following the evaluation guidance, DynPipe adaptively <italic>adjusts</i> partition plan to restore both intra and inter-stage load balancing, thereby facilitating seamless pipeline reconfiguration in dynamic environments. Extensive experiments show that DynPipe outperforms state-of-the-art systems, accelerating the time-to-accuracy by <italic>1.5-3.4×</i>.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 11","pages":"2366-2382"},"PeriodicalIF":6.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11150566","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel Wormhole Filters: High-Performance Approximate Membership Query Data Structures for Persistent Memory 并行虫洞过滤器:持久内存的高性能近似隶属查询数据结构
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-09-03 DOI: 10.1109/TPDS.2025.3605780
Hancheng Wang;Haipeng Dai;Shusen Chen;Meng Li;Rong Gu;Youyou Lu;Chengxun Wu;Jiaqi Zheng;Lexi Xu;Guihai Chen
Approximate membership query (AMQ) data structures can approximately determine whether an element exists in a given dataset. They are widely used in parallel and distributed systems (e.g., high-performance databases, distributed cache systems, and bioinformatics systems) to avoid unnecessary dataset accesses, thereby accelerating massive data processing. For AMQ data structures used in the above systems, achieving high throughput, low false positive rate, and large capacity objectives simultaneously is critical but challenging. Porting AMQ data structures from DRAM to persistent memory makes it possible to achieve the above three objectives simultaneously, but this porting is not a trivial task. Specifically, existing AMQ data structures generate numerous random accesses and/or sequential writes on persistent memory, resulting in poor throughput. Therefore, in the conference version of this paper, we proposed a novel AMQ data structure called wormhole filter, which achieves high throughput on persistent memory, thereby achieving the above three objectives simultaneously. In this journal version, we extend our prior work by introducing parallel wormhole filters to enhance parallel performance. Additionally, we integrate parallel wormhole filters into the LevelDB database system to show that porting AMQ data structures to persistent memory significantly improves system end-to-end throughput. Theoretical analysis and experimental results show that wormhole filters significantly outperform state-of-the-art AMQ data structures. For example, wormhole filters achieve 12.06× insertion throughput, 1.98× positive lookup throughput, and 8.82× deletion throughput of the best competing baseline.
近似隶属查询(AMQ)数据结构可以近似地确定一个元素是否存在于给定的数据集中。它们广泛应用于并行和分布式系统(如高性能数据库、分布式缓存系统和生物信息学系统),以避免不必要的数据集访问,从而加速海量数据的处理。对于上述系统中使用的AMQ数据结构,同时实现高吞吐量、低误报率和大容量目标至关重要,但具有挑战性。将AMQ数据结构从DRAM移植到持久内存使得同时实现上述三个目标成为可能,但是这种移植并不是一项微不足道的任务。具体来说,现有的AMQ数据结构会在持久内存上生成大量随机访问和/或顺序写入,从而导致较差的吞吐量。因此,在本文的会议版中,我们提出了一种新的AMQ数据结构,称为虫洞过滤器,它在持久存储器上实现了高吞吐量,从而同时实现了上述三个目标。在这个期刊版本中,我们通过引入并行虫洞滤波器来扩展我们之前的工作,以提高并行性能。此外,我们将并行虫洞过滤器集成到LevelDB数据库系统中,以表明将AMQ数据结构移植到持久内存显著提高了系统的端到端吞吐量。理论分析和实验结果表明,虫洞滤波器明显优于最先进的AMQ数据结构。例如,虫洞过滤器的最佳竞争基线的插入吞吐量为12.06倍,正查找吞吐量为1.98倍,删除吞吐量为8.82倍。
{"title":"Parallel Wormhole Filters: High-Performance Approximate Membership Query Data Structures for Persistent Memory","authors":"Hancheng Wang;Haipeng Dai;Shusen Chen;Meng Li;Rong Gu;Youyou Lu;Chengxun Wu;Jiaqi Zheng;Lexi Xu;Guihai Chen","doi":"10.1109/TPDS.2025.3605780","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3605780","url":null,"abstract":"Approximate membership query (AMQ) data structures can approximately determine whether an element exists in a given dataset. They are widely used in parallel and distributed systems (e.g., high-performance databases, distributed cache systems, and bioinformatics systems) to avoid unnecessary dataset accesses, thereby accelerating massive data processing. For AMQ data structures used in the above systems, achieving high throughput, low false positive rate, and large capacity objectives simultaneously is critical but challenging. Porting AMQ data structures from DRAM to persistent memory makes it possible to achieve the above three objectives simultaneously, but this porting is not a trivial task. Specifically, existing AMQ data structures generate numerous random accesses and/or sequential writes on persistent memory, resulting in poor throughput. Therefore, in the conference version of this paper, we proposed a novel AMQ data structure called wormhole filter, which achieves high throughput on persistent memory, thereby achieving the above three objectives simultaneously. In this journal version, we extend our prior work by introducing parallel wormhole filters to enhance parallel performance. Additionally, we integrate parallel wormhole filters into the LevelDB database system to show that porting AMQ data structures to persistent memory significantly improves system end-to-end throughput. Theoretical analysis and experimental results show that wormhole filters significantly outperform state-of-the-art AMQ data structures. For example, wormhole filters achieve 12.06× insertion throughput, 1.98× positive lookup throughput, and 8.82× deletion throughput of the best competing baseline.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"36 11","pages":"2229-2246"},"PeriodicalIF":6.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1