首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
Optimization Method Based on K-WPA for Multinode Cooperative Localization Formation Grouping 基于K-WPA的多节点协同定位编队优化方法
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-16 DOI: 10.1109/TPDS.2026.3655025
Chun-Li Shao;Liu-Yun He;Pu Yang;Ze-Xia Huang;Guo-Yang Ye
Multinode cooperative system with flexible grouping capabilities will become a future development trend to adapt well to the complex and dynamic mission requirements. To address the challenge of cooperative node selection in multinode cooperative localization, this study proposes an optimization algorithm for formation grouping in multinode cooperative localization based on the K-means algorithm and the wolf pack algorithm (WPA) (referred to as K-WPA). The algorithm incorporates more practical constraints to guide multinode cluster grouping, thereby improving the efficiency of cluster grouping. In accordance with the clustering results, the population update process of the WPA is optimized to avoid convergence to local optima. By using the Fisher information matrix, the objective function of the WPA is designed, and the optimization process of formation grouping is evaluated. Dynamic grouping simulations are conducted for cooperative systems with 20, 30, and 50 nodes. Results indicate that the proposed K-WPA method improves positioning accuracy by up to 41.24% compared to fixed grouping. Furthermore, the K-WPA algorithm combining space division and parallel grouping optimization maintains the average execution time within 1 s for the thousand-node swarm.
具有灵活分组能力的多节点协同系统将成为未来的发展趋势,以更好地适应复杂动态的任务需求。为了解决多节点协同定位中合作节点选择的难题,本文提出了一种基于k -均值算法和狼群算法(WPA)的多节点协同定位编队分组优化算法(以下简称K-WPA)。该算法引入更多实际约束来指导多节点聚类分组,从而提高了聚类分组的效率。根据聚类结果,优化WPA的种群更新过程,避免收敛到局部最优。利用Fisher信息矩阵设计了WPA的目标函数,并对队形分组的优化过程进行了评价。分别对20、30、50节点的合作系统进行了动态分组仿真。结果表明,与固定分组相比,K-WPA方法的定位精度提高了41.24%。结合空间分割和并行分组优化的K-WPA算法使千节点群的平均执行时间保持在1s以内。
{"title":"Optimization Method Based on K-WPA for Multinode Cooperative Localization Formation Grouping","authors":"Chun-Li Shao;Liu-Yun He;Pu Yang;Ze-Xia Huang;Guo-Yang Ye","doi":"10.1109/TPDS.2026.3655025","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3655025","url":null,"abstract":"Multinode cooperative system with flexible grouping capabilities will become a future development trend to adapt well to the complex and dynamic mission requirements. To address the challenge of cooperative node selection in multinode cooperative localization, this study proposes an optimization algorithm for formation grouping in multinode cooperative localization based on the K-means algorithm and the wolf pack algorithm (WPA) (referred to as K-WPA). The algorithm incorporates more practical constraints to guide multinode cluster grouping, thereby improving the efficiency of cluster grouping. In accordance with the clustering results, the population update process of the WPA is optimized to avoid convergence to local optima. By using the Fisher information matrix, the objective function of the WPA is designed, and the optimization process of formation grouping is evaluated. Dynamic grouping simulations are conducted for cooperative systems with 20, 30, and 50 nodes. Results indicate that the proposed K-WPA method improves positioning accuracy by up to 41.24% compared to fixed grouping. Furthermore, the K-WPA algorithm combining space division and parallel grouping optimization maintains the average execution time within 1 s for the thousand-node swarm.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 3","pages":"697-709"},"PeriodicalIF":6.0,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resource-Efficient Personal Large Language Models Fine-Tuning With Collaborative Edge Computing 资源高效的个人大型语言模型微调与协作边缘计算
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-16 DOI: 10.1109/TPDS.2026.3654957
Shengyuan Ye;Bei Ouyang;Tianyi Qian;Liekang Zeng;Jingyi Li;Jiangsu Du;Xiaowen Chu;Guoliang Xing;Xu Chen
Large language models (LLMs) have enabled transformative applications at the network edge, such as intelligent personal assistants. However, data privacy and security concerns necessitate a shift from cloud-centric paradigms to edge-based fine-tuning for personal LLMs. This transition is significantly hindered by intensive computational requirements and inherent resource scarcity, creating a “resource wall” that compromises training efficiency and feasibility. While current parameter-efficient fine-tuning (PEFT) and resource management strategies attempt to mitigate these constraints, they remain insufficient for the limited capacities of individual edge devices. To address these challenges, we propose PAC+, a resourceefficient collaborative edge AI framework for in-situ personal LLM fine-tuning. PAC+ overcomes the resource bottlenecks through a sophisticated algorithm-system codesign: (1) Algorithmically, PAC+ introduces a fine-tuning technique optimized for parameters, time, and memory. It utilizes Parallel Adapters to circumvent the need for a full backward pass through the LLM backbone. Furthermore, an activation cache mechanism streamlines the process by negating redundant forward passes across multiple epochs. (2) Systematically, PAC+ aggregates proximate edge devices into a collective resource pool, employing hybrid data and pipeline parallelism to orchestrate distributed training. By leveraging the activation cache, PAC+ enables the exclusive fine-tuning of Parallel Adapters via data parallelism, effectively bypassing the backbone's constraints. Extensive evaluation of the prototype implementation demonstrates that PAC+ significantly outperforms existing collaborative edge training systems, achieving up to a 9.7× end-to-end speedup. Furthermore, compared to mainstream LLM fine-tuning algorithms, PAC+ reduces memory footprint by up to 88.16%.
大型语言模型(llm)使网络边缘的变革性应用程序成为可能,例如智能个人助理。然而,数据隐私和安全问题需要从以云为中心的范式转向基于边缘的个人法学硕士微调。这种转变被密集的计算需求和固有的资源稀缺严重阻碍,形成了一堵“资源墙”,损害了训练的效率和可行性。虽然当前的参数有效微调(PEFT)和资源管理策略试图缓解这些限制,但它们仍然不足以满足单个边缘设备的有限容量。为了应对这些挑战,我们提出了PAC+,这是一个资源高效的协作边缘人工智能框架,用于现场个人LLM微调。PAC+通过复杂的算法-系统协同设计克服了资源瓶颈:(1)在算法上,PAC+引入了一种针对参数、时间和内存进行优化的微调技术。它利用并行适配器来避免需要通过LLM主干进行完整的向后传递。此外,激活缓存机制通过消除跨多个epoch的冗余转发来简化过程。(2) PAC+系统地将邻近边缘设备聚合到一个集体资源池中,采用混合数据和管道并行来编排分布式训练。通过利用激活缓存,PAC+可以通过数据并行性对并行适配器进行排他性微调,从而有效地绕过主干的约束。对原型实现的广泛评估表明,PAC+显著优于现有的协作边缘训练系统,实现了高达9.7倍的端到端加速。此外,与主流LLM微调算法相比,PAC+减少了高达88.16%的内存占用。
{"title":"Resource-Efficient Personal Large Language Models Fine-Tuning With Collaborative Edge Computing","authors":"Shengyuan Ye;Bei Ouyang;Tianyi Qian;Liekang Zeng;Jingyi Li;Jiangsu Du;Xiaowen Chu;Guoliang Xing;Xu Chen","doi":"10.1109/TPDS.2026.3654957","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3654957","url":null,"abstract":"Large language models (LLMs) have enabled transformative applications at the network edge, such as intelligent personal assistants. However, data privacy and security concerns necessitate a shift from cloud-centric paradigms to edge-based fine-tuning for personal LLMs. This transition is significantly hindered by intensive computational requirements and inherent resource scarcity, creating a “resource wall” that compromises training efficiency and feasibility. While current parameter-efficient fine-tuning (PEFT) and resource management strategies attempt to mitigate these constraints, they remain insufficient for the limited capacities of individual edge devices. To address these challenges, we propose <monospace>PAC+</monospace>, a resourceefficient collaborative edge AI framework for in-situ personal LLM fine-tuning. <monospace>PAC+</monospace> overcomes the resource bottlenecks through a sophisticated algorithm-system codesign: (1) Algorithmically, <monospace>PAC+</monospace> introduces a fine-tuning technique optimized for parameters, time, and memory. It utilizes Parallel Adapters to circumvent the need for a full backward pass through the LLM backbone. Furthermore, an activation cache mechanism streamlines the process by negating redundant forward passes across multiple epochs. (2) Systematically, <monospace>PAC+</monospace> aggregates proximate edge devices into a collective resource pool, employing hybrid data and pipeline parallelism to orchestrate distributed training. By leveraging the activation cache, <monospace>PAC+</monospace> enables the exclusive fine-tuning of Parallel Adapters via data parallelism, effectively bypassing the backbone's constraints. Extensive evaluation of the prototype implementation demonstrates that <monospace>PAC+</monospace> significantly outperforms existing collaborative edge training systems, achieving up to a 9.7× end-to-end speedup. Furthermore, compared to mainstream LLM fine-tuning algorithms, <monospace>PAC+</monospace> reduces memory footprint by up to 88.16%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 3","pages":"680-696"},"PeriodicalIF":6.0,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2025 Reviewers List* 2025审稿人名单*
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-14 DOI: 10.1109/TPDS.2025.3639693
{"title":"2025 Reviewers List*","authors":"","doi":"10.1109/TPDS.2025.3639693","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3639693","url":null,"abstract":"","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 2","pages":"593-599"},"PeriodicalIF":6.0,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11353051","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MEOCI: Model Partitioning and Early-Exit Point Selection Joint Optimization for Collaborative Inference in Vehicular Edge Computing MEOCI:面向车辆边缘计算协同推理的模型划分和早期退出点选择联合优化
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-12 DOI: 10.1109/TPDS.2026.3652171
Chunlin Li;Jiaqi Wang;Kun Jiang;Cheng Xiong;Shaohua Wan
In recent years, deep neural networks (DNNs) have been widely used in Vehicular Edge Computing (VEC), becoming the core technology for most intelligent applications. However, these DNN inference tasks are usually computation-intensive and latency-sensitive. In urban autonomous driving scenarios, when a large number of vehicles offload tasks to roadside units (RSUs), they face the problem of computational overload of edge servers and inference delay beyond tolerable limits. To address these challenges, we propose an edge-vehicle collaborative inference acceleration mechanism, namely Model partitioning and Early-exit point selection joint Optimization for Collaborative Inference (MEOCI). Specifically, we dynamically select the optimal model partitioning points with the constraint of RSU computing resources and vehicle computing capabilities; and according to the accuracy threshold set to choose the appropriate early exit point. The goal is to minimize the average inference delay under the inference accuracy constraint. Therefore, we propose the Adaptive Dual-Pool Dueling Double Deep Q-Network (ADP-D3QN) algorithm, which enhances the exploration strategy and experience replay mechanism of D3QN to implement the proposed optimization mechanism MEOCI. We conduct comprehensive performance evaluations using four DNN models: AlexNet, VGG16, ResNet50, YOLOv10n. Experimental results show the proposed ADP-D3QN algorithm reduces average inference delay by 15.8% for AlexNet and 8.7% for VGG16 compared to baseline algorithm.
近年来,深度神经网络(dnn)在车辆边缘计算(VEC)中得到了广泛的应用,成为大多数智能应用的核心技术。然而,这些DNN推理任务通常是计算密集型和延迟敏感的。在城市自动驾驶场景中,当大量车辆将任务卸载到路边单元(rsu)时,它们面临边缘服务器计算过载和推理延迟超出可容忍范围的问题。为了解决这些问题,我们提出了一种边缘车辆协同推理加速机制,即模型划分和早期退出点选择联合优化协同推理(MEOCI)。具体而言,在RSU计算资源和车辆计算能力约束下,动态选择最优模型划分点;并根据设定的精度阈值选择合适的提前退出点。目标是在推理精度约束下最小化平均推理延迟。为此,我们提出了自适应双池决斗双深度q网络(ADP-D3QN)算法,该算法增强了D3QN的探索策略和经验重放机制,以实现所提出的优化机制MEOCI。我们使用四个DNN模型:AlexNet, VGG16, ResNet50, YOLOv10n进行综合性能评估。实验结果表明,与基线算法相比,ADP-D3QN算法可将AlexNet的平均推理延迟降低15.8%,将VGG16的平均推理延迟降低8.7%。
{"title":"MEOCI: Model Partitioning and Early-Exit Point Selection Joint Optimization for Collaborative Inference in Vehicular Edge Computing","authors":"Chunlin Li;Jiaqi Wang;Kun Jiang;Cheng Xiong;Shaohua Wan","doi":"10.1109/TPDS.2026.3652171","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3652171","url":null,"abstract":"In recent years, deep neural networks (DNNs) have been widely used in Vehicular Edge Computing (VEC), becoming the core technology for most intelligent applications. However, these DNN inference tasks are usually computation-intensive and latency-sensitive. In urban autonomous driving scenarios, when a large number of vehicles offload tasks to roadside units (RSUs), they face the problem of computational overload of edge servers and inference delay beyond tolerable limits. To address these challenges, we propose an edge-vehicle collaborative inference acceleration mechanism, namely Model partitioning and Early-exit point selection joint Optimization for Collaborative Inference (MEOCI). Specifically, we dynamically select the optimal model partitioning points with the constraint of RSU computing resources and vehicle computing capabilities; and according to the accuracy threshold set to choose the appropriate early exit point. The goal is to minimize the average inference delay under the inference accuracy constraint. Therefore, we propose the Adaptive Dual-Pool Dueling Double Deep Q-Network (ADP-D3QN) algorithm, which enhances the exploration strategy and experience replay mechanism of D3QN to implement the proposed optimization mechanism MEOCI. We conduct comprehensive performance evaluations using four DNN models: AlexNet, VGG16, ResNet50, YOLOv10n. Experimental results show the proposed ADP-D3QN algorithm reduces average inference delay by 15.8% for AlexNet and 8.7% for VGG16 compared to baseline algorithm.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 3","pages":"666-679"},"PeriodicalIF":6.0,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FairGFL: Privacy-Preserving Fairness-Aware Federated Learning With Overlapping Subgraphs FairGFL:具有重叠子图的隐私保护公平感知联邦学习
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-06 DOI: 10.1109/TPDS.2025.3649863
Zihao Zhou;Shusen Yang;Fangyuan Zhao;Xuebin Ren
Graph federated learning enables the collaborative extraction of high-order information from distributed subgraphs while preserving the privacy of raw data. However, graph data often exhibits overlap among different clients. Previous research has demonstrated certain benefits of overlapping data in mitigating data heterogeneity. However, the negative effects have not been explored, particularly in cases where the overlaps are imbalanced across clients. In this paper, we uncover the unfairness issue arising from imbalanced overlapping subgraphs through both empirical observations and theoretical reasoning. To address this issue, we propose FairGFL (FAIRness-aware subGraph Federated Learning), a novel algorithm that enhances cross-client fairness while maintaining model utility in a privacy-preserving manner. Specifically, FairGFL incorporates an interpretable weighted aggregation approach to enhance fairness across clients, leveraging privacy-preserving estimation of their overlapping ratios. Furthermore, FairGFL improves the tradeoff between model utility and fairness by integrating a carefully crafted regularizer into the federated composite loss function. Through extensive experiments on four benchmark graph datasets, we demonstrate that FairGFL outperforms four representative baseline algorithms in terms of both model utility and fairness.
图联邦学习支持从分布式子图中协作提取高阶信息,同时保护原始数据的隐私。然而,图形数据经常在不同的客户机之间显示重叠。以前的研究已经证明了重叠数据在减轻数据异质性方面的某些好处。然而,负面影响尚未得到探讨,特别是在客户之间重叠不平衡的情况下。本文通过实证观察和理论推理,揭示了不平衡重叠子图引起的不公平问题。为了解决这个问题,我们提出了FairGFL(公平感知子图联邦学习),这是一种新的算法,可以增强跨客户端公平性,同时以保护隐私的方式保持模型效用。具体来说,FairGFL结合了一种可解释的加权聚合方法,以增强客户端的公平性,利用他们重叠比率的隐私保护估计。此外,FairGFL通过将精心设计的正则化器集成到联邦复合损失函数中,改善了模型效用和公平性之间的权衡。通过在四个基准图数据集上的大量实验,我们证明FairGFL在模型效用和公平性方面优于四种代表性基线算法。
{"title":"FairGFL: Privacy-Preserving Fairness-Aware Federated Learning With Overlapping Subgraphs","authors":"Zihao Zhou;Shusen Yang;Fangyuan Zhao;Xuebin Ren","doi":"10.1109/TPDS.2025.3649863","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3649863","url":null,"abstract":"Graph federated learning enables the collaborative extraction of high-order information from distributed subgraphs while preserving the privacy of raw data. However, graph data often exhibits overlap among different clients. Previous research has demonstrated certain benefits of overlapping data in mitigating data heterogeneity. However, the negative effects have not been explored, particularly in cases where the overlaps are imbalanced across clients. In this paper, we uncover the unfairness issue arising from imbalanced overlapping subgraphs through both empirical observations and theoretical reasoning. To address this issue, we propose FairGFL (FAIRness-aware subGraph Federated Learning), a novel algorithm that enhances cross-client fairness while maintaining model utility in a privacy-preserving manner. Specifically, FairGFL incorporates an interpretable weighted aggregation approach to enhance fairness across clients, leveraging privacy-preserving estimation of their overlapping ratios. Furthermore, FairGFL improves the tradeoff between model utility and fairness by integrating a carefully crafted regularizer into the federated composite loss function. Through extensive experiments on four benchmark graph datasets, we demonstrate that FairGFL outperforms four representative baseline algorithms in terms of both model utility and fairness.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 3","pages":"710-725"},"PeriodicalIF":6.0,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SMEStencil: Optimizing High-Order Stencils on ARM Multicore Using SME Unit SMEStencil:使用SME单元在ARM多核上优化高阶模板
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-05 DOI: 10.1109/TPDS.2025.3650515
Yinuo Wang;Tianqi Mao;Lin Gan;Wubing Wan;Zeyu Song;Jiayu Fu;Lanke He;Wenqiang Wang;Zekun Yin;Wei Xue;Guangwen Yang
Matrix-accelerated stencil computation is a hot research topic, yet its application to 3 dimensional (3D) high-order stencils and HPC remains underexplored. With the emergence of Scalable Matrix Extension(SME) on ARMv9-A CPU, we analyze SME-based accelerating strategies and tailor an optimal approach for 3D high-order stencils. We introduce algorithmic optimizations based on Scalable Vector Extension(SVE) and SME unit to address strided memory accesses, alignment conflicts, and redundant accesses. We propose memory optimizations to boost on-package memory efficiency, and a novel multi-thread parallelism paradigm to overcome data-sharing challenges caused by the absence of shared data caches. SMEStencil sustains consistently high hardware utilization across diverse stencil shapes and dimensions. Our DMA-based inter-NUMA communication further mitigates NUMA effects and MPI limitations in hybrid parallelism. Combining all the innovations, SMEStencil outperforms state-of-the-art libraries on Nividia A100 GPGPU by up to 2.1×. Moreover, the performance improvements enabled by our optimizations translate directly to real-world HPC applications and enable Reverse Time Migration(RTM) real-world applications to yield 1.8x speedup versus highly-optimized Nvidia A100 GPGPU version.
矩阵加速模板计算是一个研究热点,但其在三维高阶模板和高性能计算中的应用还有待探索。随着可扩展矩阵扩展(SME)在ARMv9-A CPU上的出现,我们分析了基于SME的加速策略,并为3D高阶模板定制了一种优化方法。我们引入了基于可扩展向量扩展(SVE)和SME单元的算法优化来解决跨行内存访问、对齐冲突和冗余访问。我们提出了内存优化来提高包内内存效率,并提出了一种新的多线程并行模式来克服由于缺乏共享数据缓存而导致的数据共享挑战。SMEStencil在不同的模板形状和尺寸上保持一致的高硬件利用率。我们基于dma的NUMA间通信进一步减轻了混合并行中的NUMA效应和MPI限制。结合所有的创新,SMEStencil在nvidia A100 GPGPU上的性能比最先进的库高出2.1倍。此外,通过我们的优化实现的性能改进直接转化为现实世界的HPC应用程序,并使反向时间迁移(RTM)现实世界的应用程序与高度优化的Nvidia A100 GPGPU版本相比产生1.8倍的加速。
{"title":"SMEStencil: Optimizing High-Order Stencils on ARM Multicore Using SME Unit","authors":"Yinuo Wang;Tianqi Mao;Lin Gan;Wubing Wan;Zeyu Song;Jiayu Fu;Lanke He;Wenqiang Wang;Zekun Yin;Wei Xue;Guangwen Yang","doi":"10.1109/TPDS.2025.3650515","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3650515","url":null,"abstract":"Matrix-accelerated stencil computation is a hot research topic, yet its application to 3 dimensional (3D) high-order stencils and HPC remains underexplored. With the emergence of Scalable Matrix Extension(SME) on ARMv9-A CPU, we analyze SME-based accelerating strategies and tailor an optimal approach for 3D high-order stencils. We introduce algorithmic optimizations based on Scalable Vector Extension(SVE) and SME unit to address strided memory accesses, alignment conflicts, and redundant accesses. We propose memory optimizations to boost on-package memory efficiency, and a novel multi-thread parallelism paradigm to overcome data-sharing challenges caused by the absence of shared data caches. SMEStencil sustains consistently high hardware utilization across diverse stencil shapes and dimensions. Our DMA-based inter-NUMA communication further mitigates NUMA effects and MPI limitations in hybrid parallelism. Combining all the innovations, SMEStencil outperforms state-of-the-art libraries on Nividia A100 GPGPU by up to 2.1×. Moreover, the performance improvements enabled by our optimizations translate directly to real-world HPC applications and enable Reverse Time Migration(RTM) real-world applications to yield 1.8x speedup versus highly-optimized Nvidia A100 GPGPU version.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 3","pages":"651-665"},"PeriodicalIF":6.0,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking Parameter Tuning in Distributed Storage Systems via Knowledge Graph Query 基于知识图查询的分布式存储系统参数调优思考
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2026-01-05 DOI: 10.1109/TPDS.2025.3650593
Wang Zhang;Hongyu Wang;Zhan Shi;Yutong Wu;Mingjin Li;Tingfang Li;Fang Wang;Dan Feng
The growing volume of performance-critical parameters in distributed storage systems, coupled with diverse and dynamic workload patterns, has significantly increased the complexity of system configuration. These trends have expanded the parameter space while tightening the time window for tuning convergence, making it challenging to maintain high system performance. Existing tuning strategies often struggle to balance thorough parameter exploration with real-time responsiveness, limiting their effectiveness under fast-evolving workloads and heterogeneous deployment environments. To address these challenges, we propose KGQW, the first framework that formulates automated parameter tuning as a knowledge graph query workflow. KGQW models workload features and system parameters as graph vertices, with performance metrics represented as edges, and constructs an initial knowledge graph through lightweight performance tests. Guided by performance prediction and Bayesian-driven exploration, KGQW progressively expands the graph, prunes insensitive parameters, and refines performance relationships to build an informative and reusable knowledge graph that supports rapid configuration retrieval via graph querying. Moreover, KGQW enables efficient knowledge transfer across clusters, substantially reducing the construction cost for new clusters. Experiments on real-world applications and storage clusters demonstrate that KGQW achieves second-level tuning latency, while maintaining or surpassing the performance of state-of-the-art methods. These results highlight the promise of knowledge-driven tuning in meeting the scalability and adaptability demands of modern distributed storage systems.
分布式存储系统中越来越多的性能关键参数,加上多样化和动态的工作负载模式,大大增加了系统配置的复杂性。这些趋势扩大了参数空间,同时收紧了调整收敛的时间窗口,使保持高系统性能变得具有挑战性。现有的调优策略常常难以平衡全面的参数探索和实时响应,这限制了它们在快速发展的工作负载和异构部署环境下的有效性。为了应对这些挑战,我们提出了KGQW,这是第一个将自动参数调优作为知识图查询工作流的框架。KGQW将工作负载特征和系统参数建模为图顶点,将性能指标表示为边,并通过轻量级性能测试构建初始知识图。在性能预测和贝叶斯驱动探索的指导下,KGQW逐步扩展图,去除不敏感的参数,细化性能关系,构建一个信息丰富、可重用的知识图,支持通过图查询快速检索配置。此外,KGQW使知识在集群之间有效转移,大大降低了新集群的建设成本。在实际应用程序和存储集群上的实验表明,KGQW实现了秒级调优延迟,同时保持或超过了最先进方法的性能。这些结果突出了知识驱动调优在满足现代分布式存储系统的可扩展性和适应性需求方面的前景。
{"title":"Rethinking Parameter Tuning in Distributed Storage Systems via Knowledge Graph Query","authors":"Wang Zhang;Hongyu Wang;Zhan Shi;Yutong Wu;Mingjin Li;Tingfang Li;Fang Wang;Dan Feng","doi":"10.1109/TPDS.2025.3650593","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3650593","url":null,"abstract":"The growing volume of performance-critical parameters in distributed storage systems, coupled with diverse and dynamic workload patterns, has significantly increased the complexity of system configuration. These trends have expanded the parameter space while tightening the time window for tuning convergence, making it challenging to maintain high system performance. Existing tuning strategies often struggle to balance thorough parameter exploration with real-time responsiveness, limiting their effectiveness under fast-evolving workloads and heterogeneous deployment environments. To address these challenges, we propose KGQW, the first framework that formulates automated parameter tuning as a knowledge graph query workflow. KGQW models workload features and system parameters as graph vertices, with performance metrics represented as edges, and constructs an initial knowledge graph through lightweight performance tests. Guided by performance prediction and Bayesian-driven exploration, KGQW progressively expands the graph, prunes insensitive parameters, and refines performance relationships to build an informative and reusable knowledge graph that supports rapid configuration retrieval via graph querying. Moreover, KGQW enables efficient knowledge transfer across clusters, substantially reducing the construction cost for new clusters. Experiments on real-world applications and storage clusters demonstrate that KGQW achieves second-level tuning latency, while maintaining or surpassing the performance of state-of-the-art methods. These results highlight the promise of knowledge-driven tuning in meeting the scalability and adaptability demands of modern distributed storage systems.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 3","pages":"633-650"},"PeriodicalIF":6.0,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146071153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Management of Persistent Data Structures in High-Performance Analytics 高性能分析中持久数据结构的优化管理
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-31 DOI: 10.1109/TPDS.2025.3646133
Karim Youssef;Keita Iwabuchi;Maya Gokhale;Wu-chun Feng;Roger Pearce
Large-scale data analytics workflows ingest massive input data into various data structures, including graphs and key-value datastores. These data structures undergo multiple transformations and computations and are typically reused in incremental and iterative analytics workflows. Persisting in-memory views of these data structures enables reusing them beyond the scope of a single program run while avoiding repetitive raw data ingestion overheads. Memory-mapped I/O enables persisting in-memory data structures without data serialization and deserialization overheads. However, memory-mapped I/O lacks the key feature of persisting consistent snapshots of these data structures for incremental ingestion and processing. The obstacles to efficient virtual memory snapshots using memory-mapped I/O include background writebacks outside the application’s control, and the significantly high storage footprint of such snapshots. To address these limitations, we present Privateer, a memory and storage management tool that enables storage-efficient virtual memory snapshotting while also optimizing snapshot I/O performance. We integrated Privateer into Metall, a state-of-the-art persistent memory allocator for C++, and the Lightning Memory-Mapped Database (LMDB), a widely-used key-value datastore in data analytics and machine learning. Privateer optimized application performance by 1.22× when storing data structure snapshots to node-local storage, and up to 16.7× when storing snapshots to a parallel file system. Privateer also optimizes storage efficiency of incremental data structure snapshots by up to 11× using data deduplication and compression.
大规模数据分析工作流将大量输入数据摄取到各种数据结构中,包括图形和键值数据存储。这些数据结构经历多次转换和计算,并且通常在增量和迭代分析工作流中重用。在内存中持久化这些数据结构的视图可以在单个程序运行范围之外重用它们,同时避免重复的原始数据摄取开销。内存映射I/O支持在没有数据序列化和反序列化开销的情况下持久化内存中的数据结构。然而,内存映射I/O缺乏为增量摄取和处理持久化这些数据结构的一致快照的关键特性。使用内存映射I/O实现高效虚拟内存快照的障碍包括应用程序控制之外的后台写回,以及此类快照的高存储占用。为了解决这些限制,我们提出了Privateer,这是一个内存和存储管理工具,可以实现存储效率高的虚拟内存快照,同时还可以优化快照I/O性能。我们将Privateer集成到Metall(面向c++的最先进的持久内存分配器)和Lightning memory - mapped Database (LMDB)(在数据分析和机器学习中广泛使用的键值数据存储)中。当将数据结构快照存储到节点本地存储时,Privateer将应用程序性能优化了1.22倍,当将快照存储到并行文件系统时,性能优化了16.7倍。Privateer还通过重复数据删除和压缩,将增量数据结构快照的存储效率提高了11倍。
{"title":"Optimizing Management of Persistent Data Structures in High-Performance Analytics","authors":"Karim Youssef;Keita Iwabuchi;Maya Gokhale;Wu-chun Feng;Roger Pearce","doi":"10.1109/TPDS.2025.3646133","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3646133","url":null,"abstract":"Large-scale data analytics workflows ingest massive input data into various data structures, including graphs and key-value datastores. These data structures undergo multiple transformations and computations and are typically reused in incremental and iterative analytics workflows. Persisting in-memory views of these data structures enables reusing them beyond the scope of a single program run while avoiding repetitive raw data ingestion overheads. Memory-mapped I/O enables persisting in-memory data structures without data serialization and deserialization overheads. However, memory-mapped I/O lacks the key feature of persisting consistent snapshots of these data structures for incremental ingestion and processing. The obstacles to efficient virtual memory snapshots using memory-mapped I/O include background writebacks outside the application’s control, and the significantly high storage footprint of such snapshots. To address these limitations, we present <italic>Privateer</i>, a memory and storage management tool that enables storage-efficient virtual memory snapshotting while also optimizing snapshot I/O performance. We integrated <italic>Privateer</i> into <italic>Metall</i>, a state-of-the-art persistent memory allocator for C++, and the Lightning Memory-Mapped Database (LMDB), a widely-used key-value datastore in data analytics and machine learning. <italic>Privateer</i> optimized application performance by 1.22× when storing data structure snapshots to node-local storage, and up to 16.7× when storing snapshots to a parallel file system. <italic>Privateer</i> also optimizes storage efficiency of incremental data structure snapshots by up to 11× using data deduplication and compression.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 2","pages":"562-574"},"PeriodicalIF":6.0,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Faster Vertex Cover Algorithms on GPUs With Component-Aware Parallel Branching 基于组件感知并行分支的gpu上更快的顶点覆盖算法
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-23 DOI: 10.1109/TPDS.2025.3641049
Hussein Amro;Basel Fakhri;Amer E. Mouawad;Izzat El Hajj
Algorithms for finding minimum or bounded vertex covers in graphs use a branch-and-reduce strategy, which involves exploring a highly imbalanced search tree. Prior GPU solutions assign different thread blocks to different sub-trees, while using a shared worklist to balance the load. However, these prior solutions do not scale to large and complex graphs because their unawareness of when the graph splits into components causes them to solve these components redundantly. Moreover, their high memory footprint limits the number of workers that can execute concurrently. We propose a novel GPU solution for vertex cover problems that detects when a graph splits into components and branches on the components independently. Although the need to aggregate the solutions of different components introduces non-tail-recursive branches which interfere with load balancing, we overcome this challenge by delegating the post-processing to the last descendant of each branch. We also reduce the memory footprint by reducing the graph and inducing a subgraph before exploring the search tree. Our solution substantially outperforms the state-of-the-art GPU solution, finishing in seconds when the state-of-the-art solution exceeds 6 hours. To the best of our knowledge, our work is the first to parallelize non-tail-recursive branching patterns on GPUs in a load balanced manner.
在图中寻找最小或有界顶点覆盖的算法使用分支约简策略,该策略涉及探索高度不平衡的搜索树。先前的GPU解决方案将不同的线程块分配到不同的子树,同时使用共享工作列表来平衡负载。然而,这些先前的解决方案不能扩展到大型和复杂的图,因为它们不知道图何时分裂成组件,导致它们冗余地解决这些组件。此外,它们的高内存占用限制了可以并发执行的工作线程的数量。我们为顶点覆盖问题提出了一种新的GPU解决方案,该解决方案可以检测图何时分裂为组件并在组件上独立分支。尽管需要聚合不同组件的解决方案会引入干扰负载平衡的非尾递归分支,但我们通过将后处理委托给每个分支的最后后代来克服这一挑战。我们还通过在探索搜索树之前减少图和诱导子图来减少内存占用。我们的解决方案大大优于最先进的GPU解决方案,当最先进的解决方案超过6小时时,只需几秒钟即可完成。据我们所知,我们的工作是第一个以负载平衡的方式在gpu上并行化非尾递归分支模式的工作。
{"title":"Faster Vertex Cover Algorithms on GPUs With Component-Aware Parallel Branching","authors":"Hussein Amro;Basel Fakhri;Amer E. Mouawad;Izzat El Hajj","doi":"10.1109/TPDS.2025.3641049","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3641049","url":null,"abstract":"Algorithms for finding minimum or bounded vertex covers in graphs use a branch-and-reduce strategy, which involves exploring a highly imbalanced search tree. Prior GPU solutions assign different thread blocks to different sub-trees, while using a shared worklist to balance the load. However, these prior solutions do not scale to large and complex graphs because their unawareness of when the graph splits into components causes them to solve these components redundantly. Moreover, their high memory footprint limits the number of workers that can execute concurrently. We propose a novel GPU solution for vertex cover problems that detects when a graph splits into components and branches on the components independently. Although the need to aggregate the solutions of different components introduces non-tail-recursive branches which interfere with load balancing, we overcome this challenge by delegating the post-processing to the last descendant of each branch. We also reduce the memory footprint by reducing the graph and inducing a subgraph before exploring the search tree. Our solution substantially outperforms the state-of-the-art GPU solution, finishing in seconds when the state-of-the-art solution exceeds 6 hours. To the best of our knowledge, our work is the first to parallelize non-tail-recursive branching patterns on GPUs in a load balanced manner.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 2","pages":"504-517"},"PeriodicalIF":6.0,"publicationDate":"2025-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost-Effective Empirical Performance Modeling 成本效益实证绩效模型
IF 6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2025-12-18 DOI: 10.1109/TPDS.2025.3646119
Marcus Ritter;Benedikt Naumann;Alexandru Calotoiu;Sebastian Rinke;Thorsten Reimann;Torsten Hoefler;Felix Wolf
Performance models help us to understand how HPC applications scale, which is crucial for efficiently utilizing HPC resources. They describe the performance (e.g., runtime) as a function of one or more execution parameters (e.g., problem size and the degree of parallelism). Creating one manually for a given program is challenging and time-consuming. Automatically learning a model from performance data is a viable alternative, but potentially resource-intensive. Extra-P is a tool that implements this approach. The user begins by selecting values for each parameter. Each combination of values defines a possible measurement point. The choice of measurement points affects the quality and cost of the resulting models, creating a complex optimization problem. A naive approach takes measurements for all possible measurement points, the number of which grows exponentially with the number of parameters. In our earlier work, we demonstrated that a quasi-linear number of points is sufficient and that prioritizing the least expensive points is a generic strategy with a good trade-off between cost and quality. Here, we present an improved selection strategy based on Gaussian process regression (GPR) that selects points individually for each modeling task. In our synthetic evaluation, which was based on tens of thousands of artificially generated functions, the naive approach achieved 66% accuracy with two model parameters and 5% artificial noise. At only 10% of the naïve approach’s cost, the generic approach already achieved 47.3% accuracy, while the GPR-based approach achieved even 77.8% accuracy. Similar improvements were observed in experiments involving different numbers of model parameters and noise levels, as well as in case studies with realistic applications.
性能模型帮助我们理解HPC应用程序如何扩展,这对于高效利用HPC资源至关重要。它们将性能(例如,运行时间)描述为一个或多个执行参数(例如,问题大小和并行度)的函数。为给定的程序手动创建一个是具有挑战性和耗时的。从性能数据中自动学习模型是一种可行的替代方案,但可能会占用大量资源。Extra-P是实现这种方法的工具。用户首先为每个参数选择值。每个值的组合定义了一个可能的测量点。测量点的选择会影响最终模型的质量和成本,从而产生一个复杂的优化问题。一种朴素的方法是对所有可能的测量点进行测量,测量点的数量随着参数的数量呈指数增长。在我们早期的工作中,我们证明了准线性数量的点是足够的,并且优先考虑最便宜的点是一种在成本和质量之间良好权衡的通用策略。在这里,我们提出了一种改进的基于高斯过程回归(GPR)的选择策略,该策略为每个建模任务单独选择点。在我们基于数万个人工生成函数的综合评估中,朴素方法在两个模型参数和5%人工噪声的情况下达到66%的准确率。通用方法的成本仅为naïve方法的10%,但准确率已经达到47.3%,而基于gpr的方法甚至达到了77.8%的准确率。在涉及不同数量的模型参数和噪声水平的实验以及具有实际应用的案例研究中,也观察到类似的改进。
{"title":"Cost-Effective Empirical Performance Modeling","authors":"Marcus Ritter;Benedikt Naumann;Alexandru Calotoiu;Sebastian Rinke;Thorsten Reimann;Torsten Hoefler;Felix Wolf","doi":"10.1109/TPDS.2025.3646119","DOIUrl":"https://doi.org/10.1109/TPDS.2025.3646119","url":null,"abstract":"Performance models help us to understand how HPC applications scale, which is crucial for efficiently utilizing HPC resources. They describe the performance (e.g., runtime) as a function of one or more execution parameters (e.g., problem size and the degree of parallelism). Creating one manually for a given program is challenging and time-consuming. Automatically learning a model from performance data is a viable alternative, but potentially resource-intensive. Extra-P is a tool that implements this approach. The user begins by selecting values for each parameter. Each combination of values defines a possible measurement point. The choice of measurement points affects the quality and cost of the resulting models, creating a complex optimization problem. A naive approach takes measurements for all possible measurement points, the number of which grows exponentially with the number of parameters. In our earlier work, we demonstrated that a quasi-linear number of points is sufficient and that prioritizing the least expensive points is a generic strategy with a good trade-off between cost and quality. Here, we present an improved selection strategy based on Gaussian process regression (GPR) that selects points individually for each modeling task. In our synthetic evaluation, which was based on tens of thousands of artificially generated functions, the naive approach achieved 66% accuracy with two model parameters and 5% artificial noise. At only 10% of the naïve approach’s cost, the generic approach already achieved 47.3% accuracy, while the GPR-based approach achieved even 77.8% accuracy. Similar improvements were observed in experiments involving different numbers of model parameters and noise levels, as well as in case studies with realistic applications.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 2","pages":"575-592"},"PeriodicalIF":6.0,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1