首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
SR-FDIL: Synergistic Replay for Federated Domain-Incremental Learning SR-FDIL:联合领域增量学习的协同重放
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-08-02 DOI: 10.1109/TPDS.2024.3436874
Yichen Li;Wenchao Xu;Yining Qi;Haozhao Wang;Ruixuan Li;Song Guo
Federated Learning (FL) is to allow multiple clients to collaboratively train a model while keeping their data locally. However, existing FL approaches typically assume that the data in each client is static and fixed, which cannot account for incremental data with domain shift, leading to catastrophic forgetting on previous domains, particularly when clients are common edge devices that may lack enough storage to retain full samples of each domain. To tackle this challenge, we propose Federated Domain-Incremental Learning via Synergistic Replay (SR-FDIL), which alleviates catastrophic forgetting by coordinating all clients to cache samples and replay them. More specifically, when new data arrives, each client selects the cached samples based not only on their importance in the local dataset but also on their correlation with the global dataset. Moreover, to achieve a balance between learning new data and memorizing old data, we propose a novel client selection mechanism by jointly considering the importance of both old and new data. We conducted extensive experiments on several datasets of which the results demonstrate that SR-FDIL outperforms state-of-the-art methods by up to 4.05% in terms of average accuracy of all domains.
联合学习(FL)是允许多个客户端协同训练一个模型,同时在本地保存各自的数据。然而,现有的联合学习方法通常假定每个客户端的数据都是静态和固定的,这就无法解释域转移带来的数据增量,从而导致对先前域的灾难性遗忘,特别是当客户端是普通边缘设备时,可能缺乏足够的存储来保留每个域的完整样本。为了应对这一挑战,我们提出了通过协同重放进行联合域增量学习(SR-FDIL),通过协调所有客户端缓存样本并重放它们来缓解灾难性遗忘。更具体地说,当新数据到来时,每个客户端不仅会根据样本在本地数据集中的重要性,还会根据样本与全局数据集的相关性来选择缓存样本。此外,为了在学习新数据和记忆旧数据之间取得平衡,我们提出了一种新颖的客户端选择机制,即共同考虑新旧数据的重要性。我们在多个数据集上进行了广泛的实验,结果表明,SR-FDIL 在所有领域的平均准确率方面比最先进的方法高出 4.05%。
{"title":"SR-FDIL: Synergistic Replay for Federated Domain-Incremental Learning","authors":"Yichen Li;Wenchao Xu;Yining Qi;Haozhao Wang;Ruixuan Li;Song Guo","doi":"10.1109/TPDS.2024.3436874","DOIUrl":"10.1109/TPDS.2024.3436874","url":null,"abstract":"Federated Learning (FL) is to allow multiple clients to collaboratively train a model while keeping their data locally. However, existing FL approaches typically assume that the data in each client is static and fixed, which cannot account for incremental data with domain shift, leading to catastrophic forgetting on previous domains, particularly when clients are common edge devices that may lack enough storage to retain full samples of each domain. To tackle this challenge, we propose \u0000<bold>F</b>\u0000ederated \u0000<bold>D</b>\u0000omain-\u0000<bold>I</b>\u0000ncremental \u0000<bold>L</b>\u0000earning via \u0000<bold>S</b>\u0000ynergistic \u0000<bold>R</b>\u0000eplay (SR-FDIL), which alleviates catastrophic forgetting by coordinating all clients to cache samples and replay them. More specifically, when new data arrives, each client selects the cached samples based not only on their importance in the local dataset but also on their correlation with the global dataset. Moreover, to achieve a balance between learning new data and memorizing old data, we propose a novel client selection mechanism by jointly considering the importance of both old and new data. We conducted extensive experiments on several datasets of which the results demonstrate that SR-FDIL outperforms state-of-the-art methods by up to 4.05% in terms of average accuracy of all domains.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 11","pages":"1879-1890"},"PeriodicalIF":5.6,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost-Effective and Robust Service Provisioning in Multi-Access Edge Computing 在多接入边缘计算中提供经济高效且稳健的服务
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-30 DOI: 10.1109/TPDS.2024.3435929
Zhengzhe Xiang;Yuhang Zheng;Dongjing Wang;Javid Taheri;Zengwei Zheng;Minyi Guo
With the development of multiaccess edge computing (MEC) technology, an increasing number of researchers and developers are deploying their computation-intensive and IO-intensive services (especially AI services) on edge devices. These devices, being close to end users, provide better performance in mobile environments. By constructing a service provisioning system at the network edge, latency is significantly reduced due to short-distance communication with edge servers. However, since the MEC-based service provisioning system is resource-sensitive and the network may be unstable, careful resource allocation and traffic scheduling strategies are essential. This paper investigates and quantifies the cost-effectiveness and robustness of the MEC-based service provisioning system with the applied resource allocation and traffic scheduling strategies. Based on this analysis, a cost-effective and robust service provisioning algorithm, termed CERA, is proposed to minimize deployment costs while maintaining system robustness. Extensive experiments are conducted to compare the proposed approach with well-known baseline algorithms and evaluate factors impacting the results. The findings demonstrate that CERA achieves at least 15.9% better performance than other baseline algorithms across various instances.
随着多访问边缘计算(MEC)技术的发展,越来越多的研究人员和开发人员正在边缘设备上部署计算密集型和 IO 密集型服务(尤其是人工智能服务)。这些设备靠近终端用户,能在移动环境中提供更好的性能。通过在网络边缘构建服务供应系统,与边缘服务器的短距离通信可显著降低延迟。然而,由于基于 MEC 的服务供应系统对资源敏感,而且网络可能不稳定,因此必须采取谨慎的资源分配和流量调度策略。本文通过应用资源分配和流量调度策略,研究并量化了基于 MEC 的服务供应系统的成本效益和稳健性。在此分析基础上,提出了一种成本效益高且稳健的服务供应算法(称为 CERA),以最大限度地降低部署成本,同时保持系统的稳健性。我们进行了广泛的实验,将所提出的方法与著名的基线算法进行比较,并对影响结果的因素进行评估。实验结果表明,在各种实例中,CERA 比其他基线算法至少提高了 15.9% 的性能。
{"title":"Cost-Effective and Robust Service Provisioning in Multi-Access Edge Computing","authors":"Zhengzhe Xiang;Yuhang Zheng;Dongjing Wang;Javid Taheri;Zengwei Zheng;Minyi Guo","doi":"10.1109/TPDS.2024.3435929","DOIUrl":"10.1109/TPDS.2024.3435929","url":null,"abstract":"With the development of multiaccess edge computing (MEC) technology, an increasing number of researchers and developers are deploying their computation-intensive and IO-intensive services (especially AI services) on edge devices. These devices, being close to end users, provide better performance in mobile environments. By constructing a service provisioning system at the network edge, latency is significantly reduced due to short-distance communication with edge servers. However, since the MEC-based service provisioning system is resource-sensitive and the network may be unstable, careful resource allocation and traffic scheduling strategies are essential. This paper investigates and quantifies the cost-effectiveness and robustness of the MEC-based service provisioning system with the applied resource allocation and traffic scheduling strategies. Based on this analysis, a \u0000<bold>c</b>\u0000ost-\u0000<bold>e</b>\u0000ffective and \u0000<bold>r</b>\u0000obust service provisioning \u0000<bold>a</b>\u0000lgorithm, termed \u0000<monospace>CERA</monospace>\u0000, is proposed to minimize deployment costs while maintaining system robustness. Extensive experiments are conducted to compare the proposed approach with well-known baseline algorithms and evaluate factors impacting the results. The findings demonstrate that \u0000<monospace>CERA</monospace>\u0000 achieves at least 15.9% better performance than other baseline algorithms across various instances.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1765-1779"},"PeriodicalIF":5.6,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy Preserving Task Push in Spatial Crowdsourcing With Unknown Popularity 在未知人气的空间众包中保护隐私的任务推送
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-29 DOI: 10.1109/TPDS.2024.3434978
Yin Xu;Mingjun Xiao;Jie Wu;He Sun
In this paper, we investigate the privacy-preserving task push problem with unknown popularity in Spatial Crowdsourcing (SC), where the platform needs to select some tasks with unknown popularity and push them to workers. Meanwhile, the preferences of workers and the popularity values of tasks might involve some sensitive information, which should be protected from disclosure. To address these concerns, we propose a Privacy Preserving Auction-based Bandit scheme, termed PPAB. Specifically, on the basis of the Combinatorial Multi-armed Bandit (CMAB) game, we first construct a Differentially Private Auction-based CMAB (DPA-CMAB) model. Under the DPA-CMAB model, we design a privacy-preserving arm-pulling policy based on Diffie-Hellman (DH), Differential Privacy (DP), and upper confidence bound, which includes the DH-based encryption mechanism and the hybrid DP-based protection mechanism. The policy not only can learn the popularity of tasks and make online task push decisions, but also can protect the popularity as well as workers’ preferences from being revealed. Meanwhile, we design an auction-based incentive mechanism to determine the payment for each selected task. Furthermore, we conduct an in-depth analysis of the security and online performance of PPAB, and prove that PPAB satisfies some desired properties (i.e., truthfulness, individual rationality, and computational efficiency). Finally, the significant performance of PPAB is confirmed through extensive simulations on the real-world dataset.
在空间众包(SC)中,平台需要选择一些未知人气的任务并将其推送给工人,本文研究了未知人气下的隐私保护任务推送问题。同时,工人的偏好和任务的受欢迎程度值可能涉及一些敏感信息,这些信息应防止泄露。为了解决这些问题,我们提出了一种基于竞价排名的隐私保护方案(Privacy Preserving Auction-based Bandit scheme),简称 PPAB。具体来说,在组合多臂匪徒(CMAB)博弈的基础上,我们首先构建了一个基于差分隐私拍卖的 CMAB(DPA-CMAB)模型。在 DPA-CMAB 模型下,我们设计了一种基于 Diffie-Hellman (DH)、Differential Privacy (DP) 和置信上限的隐私保护拉臂策略,其中包括基于 DH 的加密机制和基于 DP 的混合保护机制。该策略不仅能了解任务的受欢迎程度并做出在线任务推送决策,还能保护任务的受欢迎程度和工人的偏好不被泄露。同时,我们设计了一种基于拍卖的激励机制,以确定每个选定任务的报酬。此外,我们还对 PPAB 的安全性和在线性能进行了深入分析,并证明 PPAB 满足一些期望的特性(即真实性、个体理性和计算效率)。最后,通过在真实世界数据集上进行大量仿真,证实了 PPAB 的显著性能。
{"title":"Privacy Preserving Task Push in Spatial Crowdsourcing With Unknown Popularity","authors":"Yin Xu;Mingjun Xiao;Jie Wu;He Sun","doi":"10.1109/TPDS.2024.3434978","DOIUrl":"10.1109/TPDS.2024.3434978","url":null,"abstract":"In this paper, we investigate the privacy-preserving task push problem with unknown popularity in Spatial Crowdsourcing (SC), where the platform needs to select some tasks with unknown popularity and push them to workers. Meanwhile, the preferences of workers and the popularity values of tasks might involve some sensitive information, which should be protected from disclosure. To address these concerns, we propose a Privacy Preserving Auction-based Bandit scheme, termed PPAB. Specifically, on the basis of the Combinatorial Multi-armed Bandit (CMAB) game, we first construct a Differentially Private Auction-based CMAB (DPA-CMAB) model. Under the DPA-CMAB model, we design a privacy-preserving arm-pulling policy based on Diffie-Hellman (DH), Differential Privacy (DP), and upper confidence bound, which includes the DH-based encryption mechanism and the hybrid DP-based protection mechanism. The policy not only can learn the popularity of tasks and make online task push decisions, but also can protect the popularity as well as workers’ preferences from being revealed. Meanwhile, we design an auction-based incentive mechanism to determine the payment for each selected task. Furthermore, we conduct an in-depth analysis of the security and online performance of PPAB, and prove that PPAB satisfies some desired properties (i.e., truthfulness, individual rationality, and computational efficiency). Finally, the significant performance of PPAB is confirmed through extensive simulations on the real-world dataset.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 11","pages":"2039-2053"},"PeriodicalIF":5.6,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A State-of-the-Art Review with Code about Connected Components Labeling on GPUs 用代码回顾 GPU 上连接组件标签的最新进展
IF 5.3 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-29 DOI: 10.1109/tpds.2024.3434357
Federico Bolelli, Stefano Allegretti, Luca Lumetti, Costantino Grana
{"title":"A State-of-the-Art Review with Code about Connected Components Labeling on GPUs","authors":"Federico Bolelli, Stefano Allegretti, Luca Lumetti, Costantino Grana","doi":"10.1109/tpds.2024.3434357","DOIUrl":"https://doi.org/10.1109/tpds.2024.3434357","url":null,"abstract":"","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"172 1","pages":""},"PeriodicalIF":5.3,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSA: A Uniformly Recursive Bidirection-Sequence Systolic Sorter Array SSA:统一递归双向序列 Systolic Sorter 阵列
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-26 DOI: 10.1109/TPDS.2024.3434332
Teng Gao;Lan Huang;Shang Gao;Kangping Wang
The use of reconfigurable circuits with parallel computing capabilities has been explored to enhance sorting performance and reduce power consumption. Nonetheless, most sorting algorithms utilizing dedicated processors are designed solely based on the parallelization of the algorithm, lacking considerations of specialized hardware structures. This leads to problems, including but not limited to the consumption of excessive I/O interface resources, on-chip storage resources, and complex layout wiring. In this paper, we propose a Systolic Sorter Array, implemented by a Uniform Recurrence Equation (URE) with highly parameterised in terms of data size, bit width and type. Leveraging this uniformly recursive structure, the sorter can simultaneously sort two independent sequences. In addition, we implemented global and local control modes on the FPGA to achieve higher computational frequencies. In our experiments, we have demonstrated the speed-up ratio of SSA relative to other state of the art (SOTA) sorting algorithms using C++ $std$::$sort()$ as benchmark. Inheriting the benefits from the Systolic Array architecture, the SSA reaches up to 810 Mhz computing frequency on the U200. The results of our study show that SSA outperforms other sorting algorithms in terms of throughput, speed-up ratio, and computation frequency.
人们一直在探索使用具有并行计算能力的可重构电路来提高排序性能和降低功耗。然而,大多数使用专用处理器的排序算法在设计时只考虑了算法的并行化,缺乏对专用硬件结构的考虑。这就导致了一些问题,包括但不限于消耗过多的 I/O 接口资源、片上存储资源和复杂的布局布线。在本文中,我们提出了一种通过统一递归方程(URE)实现的、在数据大小、位宽和类型方面高度参数化的 Systolic Sorter Array。利用这种均匀递归结构,分拣机可以同时对两个独立序列进行分拣。此外,我们还在 FPGA 上实现了全局和局部控制模式,以达到更高的计算频率。在实验中,我们以 C++ $std$::$sort()$ 为基准,展示了 SSA 相对于其他最新排序算法(SOTA)的加速比率。SSA 继承了 Systolic Array 架构的优点,在 U200 上的计算频率高达 810 Mhz。研究结果表明,SSA 在吞吐量、加速比和计算频率方面都优于其他排序算法。
{"title":"SSA: A Uniformly Recursive Bidirection-Sequence Systolic Sorter Array","authors":"Teng Gao;Lan Huang;Shang Gao;Kangping Wang","doi":"10.1109/TPDS.2024.3434332","DOIUrl":"10.1109/TPDS.2024.3434332","url":null,"abstract":"The use of reconfigurable circuits with parallel computing capabilities has been explored to enhance sorting performance and reduce power consumption. Nonetheless, most sorting algorithms utilizing dedicated processors are designed solely based on the parallelization of the algorithm, lacking considerations of specialized hardware structures. This leads to problems, including but not limited to the consumption of excessive I/O interface resources, on-chip storage resources, and complex layout wiring. In this paper, we propose a Systolic Sorter Array, implemented by a Uniform Recurrence Equation (URE) with highly parameterised in terms of data size, bit width and type. Leveraging this uniformly recursive structure, the sorter can simultaneously sort two independent sequences. In addition, we implemented global and local control modes on the FPGA to achieve higher computational frequencies. In our experiments, we have demonstrated the speed-up ratio of SSA relative to other state of the art (SOTA) sorting algorithms using C++ \u0000<inline-formula><tex-math>$std$</tex-math></inline-formula>\u0000::\u0000<inline-formula><tex-math>$sort()$</tex-math></inline-formula>\u0000 as benchmark. Inheriting the benefits from the Systolic Array architecture, the SSA reaches up to 810 Mhz computing frequency on the U200. The results of our study show that SSA outperforms other sorting algorithms in terms of throughput, speed-up ratio, and computation frequency.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1721-1734"},"PeriodicalIF":5.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long-Range MD Electrostatics Force Computation on FPGAs FPGA 上的长程 MD 静电力计算
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-26 DOI: 10.1109/TPDS.2024.3434347
Sahan Bandara;Anthony Ducimo;Chunshu Wu;Martin Herbordt
Strong scaling of long-range electrostatic force computation, which is a central concern of long timescale molecular dynamics simulations, is challenging for CPUs and GPUs due to its complex communication structure and global communication requirements. The scalability challenge is seen especially in small simulations of tens to hundreds of thousands of atoms that are of interest to many important applications such as physics-driven drug discovery. FPGA clusters, with their direct, tightly coupled, low-latency interconnects, are able to address these requirements. For FPGA MD clusters to be effective, however, single device performance must also be competitive. In this work, we leverage the inherent benefits of FPGAs to implement a long-range electrostatic force computation architecture. We present an overall framework with numerous algorithmic, mapping, and architecture innovations, including a unified interleaved memory, a spatial scheduling algorithm, and a design for seamless integration with the larger MD system. We examine a number of alternative configurations based on different resource allocation strategies and user parameters. We show that the best configuration of this architecture, implemented on an Intel Agilex FPGA, can achieve $2124 ns$ and $287 ns$ of simulated time per day of wall-clock time for the two molecular dynamics benchmarks DHFR and ApoA1; simulating 23K and 92K particles, respectively.
长程静电力计算的强扩展性是长时间尺度分子动力学模拟的核心问题,由于其复杂的通信结构和全局通信要求,对 CPU 和 GPU 来说具有挑战性。尤其是在数万到数十万个原子的小型模拟中,这种可扩展性挑战尤为突出,而这正是物理驱动药物发现等许多重要应用所关注的。FPGA 群集具有直接、紧密耦合、低延迟的互连功能,能够满足这些要求。然而,要使 FPGA MD 群集有效,单个设备的性能也必须具有竞争力。在这项工作中,我们利用 FPGA 的固有优势实现了长程静电力计算架构。我们提出了一个具有众多算法、映射和架构创新的整体框架,包括统一交错存储器、空间调度算法以及与大型 MD 系统无缝集成的设计。我们根据不同的资源分配策略和用户参数,研究了多种可选配置。我们的研究表明,在英特尔 Agilex FPGA 上实现的这一架构的最佳配置,可以在两个分子动力学基准 DHFR 和 ApoA1 上分别模拟 23K 和 92K 个粒子,每天壁钟时间的模拟时间分别达到 2124 ns$ 和 287 ns$。
{"title":"Long-Range MD Electrostatics Force Computation on FPGAs","authors":"Sahan Bandara;Anthony Ducimo;Chunshu Wu;Martin Herbordt","doi":"10.1109/TPDS.2024.3434347","DOIUrl":"10.1109/TPDS.2024.3434347","url":null,"abstract":"Strong scaling of long-range electrostatic force computation, which is a central concern of long timescale molecular dynamics simulations, is challenging for CPUs and GPUs due to its complex communication structure and global communication requirements. The scalability challenge is seen especially in small simulations of tens to hundreds of thousands of atoms that are of interest to many important applications such as physics-driven drug discovery. FPGA clusters, with their direct, tightly coupled, low-latency interconnects, are able to address these requirements. For FPGA MD clusters to be effective, however, single device performance must also be competitive. In this work, we leverage the inherent benefits of FPGAs to implement a long-range electrostatic force computation architecture. We present an overall framework with numerous algorithmic, mapping, and architecture innovations, including a unified interleaved memory, a spatial scheduling algorithm, and a design for seamless integration with the larger MD system. We examine a number of alternative configurations based on different resource allocation strategies and user parameters. We show that the best configuration of this architecture, implemented on an Intel Agilex FPGA, can achieve \u0000<inline-formula><tex-math>$2124 ns$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$287 ns$</tex-math></inline-formula>\u0000 of simulated time per day of wall-clock time for the two molecular dynamics benchmarks DHFR and ApoA1; simulating 23K and 92K particles, respectively.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 10","pages":"1690-1707"},"PeriodicalIF":5.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Redundancy-Free and Load-Balanced TGNN Training With Hierarchical Pipeline Parallelism 利用分层流水线并行性进行无冗余和负载平衡的 TGNN 训练
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-24 DOI: 10.1109/TPDS.2024.3432855
Yaqi Xia;Zheng Zhang;Donglin Yang;Chuang Hu;Xiaobo Zhou;Hongyang Chen;Qianlong Sang;Dazhao Cheng
Recently, Temporal Graph Neural Networks (TGNNs), as an extension of Graph Neural Networks, have demonstrated remarkable effectiveness in handling dynamic graph data. Distributed TGNN training requires efficiently tackling temporal dependency, which often leads to excessive cross-device communication that generates significant redundant data. However, existing systems are unable to remove the redundancy in data reuse and transfer, and suffer from severe communication overhead in a distributed setting. This work introduces Sven, a co-designed algorithm-system library aimed at accelerating TGNN training on a multi-GPU platform. Exploiting dependency patterns of TGNN models, we develop a redundancy-free graph organization to mitigate redundant data transfer. Additionally, we investigate communication imbalance issues among devices and formulate the graph partitioning problem as minimizing the maximum communication balance cost, which is proved to be an NP-hard problem. We propose an approximation algorithm called Re-FlexBiCut to tackle this problem. Furthermore, we incorporate prefetching, adaptive micro-batch pipelining, and asynchronous pipelining to present a hierarchical pipelining mechanism that mitigates the communication overhead. Sven represents the first comprehensive optimization solution for scaling memory-based TGNN training. Through extensive experiments conducted on a 64-GPU cluster, Sven demonstrates impressive speedup, ranging from 1.9x to 3.5x, compared to State-of-the-Art approaches. Additionally, Sven achieves up to 5.26x higher communication efficiency and reduces communication imbalance by up to 59.2%.
最近,时态图神经网络(TGNN)作为图神经网络的扩展,在处理动态图数据方面表现出了显著的效果。分布式 TGNN 训练需要有效地处理时间依赖性,而时间依赖性往往会导致过度的跨设备通信,从而产生大量冗余数据。然而,现有的系统无法消除数据重用和传输中的冗余,并且在分布式环境中存在严重的通信开销问题。这项工作介绍了 Sven,这是一个共同设计的算法系统库,旨在加速多 GPU 平台上的 TGNN 训练。利用 TGNN 模型的依赖模式,我们开发了一种无冗余图组织,以减少冗余数据传输。此外,我们还研究了设备之间的通信不平衡问题,并将图划分问题表述为最大通信平衡成本最小化,这被证明是一个 NP 难问题。我们提出了一种名为 Re-FlexBiCut 的近似算法来解决这一问题。此外,我们还结合了预取、自适应微批量流水线和异步流水线,提出了一种分层流水线机制,以减轻通信开销。Sven 是首个针对基于内存的 TGNN 训练的全面优化解决方案。通过在 64GPU 集群上进行的大量实验,与最新方法相比,Sven 的速度提高了 1.9 到 3.5 倍,令人印象深刻。此外,Sven 的通信效率提高了 5.26 倍,通信不平衡降低了 59.2%。
{"title":"Redundancy-Free and Load-Balanced TGNN Training With Hierarchical Pipeline Parallelism","authors":"Yaqi Xia;Zheng Zhang;Donglin Yang;Chuang Hu;Xiaobo Zhou;Hongyang Chen;Qianlong Sang;Dazhao Cheng","doi":"10.1109/TPDS.2024.3432855","DOIUrl":"10.1109/TPDS.2024.3432855","url":null,"abstract":"Recently, Temporal Graph Neural Networks (TGNNs), as an extension of Graph Neural Networks, have demonstrated remarkable effectiveness in handling dynamic graph data. Distributed TGNN training requires efficiently tackling temporal dependency, which often leads to excessive cross-device communication that generates significant redundant data. However, existing systems are unable to remove the redundancy in data reuse and transfer, and suffer from severe communication overhead in a distributed setting. This work introduces Sven, a co-designed algorithm-system library aimed at accelerating TGNN training on a multi-GPU platform. Exploiting dependency patterns of TGNN models, we develop a redundancy-free graph organization to mitigate redundant data transfer. Additionally, we investigate communication imbalance issues among devices and formulate the graph partitioning problem as minimizing the maximum communication balance cost, which is proved to be an NP-hard problem. We propose an approximation algorithm called Re-FlexBiCut to tackle this problem. Furthermore, we incorporate prefetching, adaptive micro-batch pipelining, and asynchronous pipelining to present a hierarchical pipelining mechanism that mitigates the communication overhead. Sven represents the first comprehensive optimization solution for scaling memory-based TGNN training. Through extensive experiments conducted on a 64-GPU cluster, Sven demonstrates impressive speedup, ranging from 1.9x to 3.5x, compared to State-of-the-Art approaches. Additionally, Sven achieves up to 5.26x higher communication efficiency and reduces communication imbalance by up to 59.2%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 11","pages":"1904-1919"},"PeriodicalIF":5.6,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IrGEMM: An Input-Aware Tuning Framework for Irregular GEMM on ARM and X86 CPUs IrGEMM:面向 ARM 和 X86 CPU 上不规则 GEMM 的输入感知调整框架
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-23 DOI: 10.1109/TPDS.2024.3432579
Cunyang Wei;Haipeng Jia;Yunquan Zhang;Jianyu Yao;Chendi Li;Wenxuan Cao
The matrix multiplication algorithm is a fundamental numerical technique in linear algebra and plays a crucial role in many scientific computing applications. Despite the high performance of mainstream basic linear algebra libraries for large-scale dense matrix multiplications, they exhibit poor performance when applied to matrix multiplication with irregular input. This paper proposes an input-aware tuning framework that accounts for application scenarios and computer architectures to provide high-performance irregular matrix multiplication on ARMv8 and X86 CPUs. The framework comprises two stages: the install-time stage and the run-time stage. The install-time stage utilizes our proposed computational template to generate high-performance kernels for general data layout and SIMD-friendly data layout. The run-time stage utilizes a tiling algorithm suitable for irregular GEMM to select the optimal kernel and link as an execution plan. Additionally, load-balanced multi-threaded optimization algorithms are defined to exploit the multi-threading capability of modern processors. Experiments demonstrate that the proposed IrGEMM framework can achieve significant performance improvements for irregular GEMM on both ARMv8 and X86 CPUs compared to other mainstream BLAS libraries.
矩阵乘法算法是线性代数中的一项基本数值技术,在许多科学计算应用中发挥着至关重要的作用。尽管主流的基本线性代数库在大规模密集矩阵乘法中表现出很高的性能,但在应用于不规则输入的矩阵乘法时却表现不佳。本文提出了一个输入感知调优框架,该框架考虑了应用场景和计算机架构,可在 ARMv8 和 X86 CPU 上提供高性能的不规则矩阵乘法。该框架包括两个阶段:安装阶段和运行阶段。安装阶段利用我们提出的计算模板,为通用数据布局和 SIMD 友好数据布局生成高性能内核。运行阶段利用适合不规则 GEMM 的平铺算法,选择最佳内核和链接作为执行计划。此外,还定义了负载平衡多线程优化算法,以利用现代处理器的多线程能力。实验证明,与其他主流 BLAS 库相比,所提出的 IrGEMM 框架可以在 ARMv8 和 X86 CPU 上显著提高不规则 GEMM 的性能。
{"title":"IrGEMM: An Input-Aware Tuning Framework for Irregular GEMM on ARM and X86 CPUs","authors":"Cunyang Wei;Haipeng Jia;Yunquan Zhang;Jianyu Yao;Chendi Li;Wenxuan Cao","doi":"10.1109/TPDS.2024.3432579","DOIUrl":"10.1109/TPDS.2024.3432579","url":null,"abstract":"The matrix multiplication algorithm is a fundamental numerical technique in linear algebra and plays a crucial role in many scientific computing applications. Despite the high performance of mainstream basic linear algebra libraries for large-scale dense matrix multiplications, they exhibit poor performance when applied to matrix multiplication with irregular input. This paper proposes an input-aware tuning framework that accounts for application scenarios and computer architectures to provide high-performance irregular matrix multiplication on ARMv8 and X86 CPUs. The framework comprises two stages: the install-time stage and the run-time stage. The install-time stage utilizes our proposed computational template to generate high-performance kernels for general data layout and SIMD-friendly data layout. The run-time stage utilizes a tiling algorithm suitable for irregular GEMM to select the optimal kernel and link as an execution plan. Additionally, load-balanced multi-threaded optimization algorithms are defined to exploit the multi-threading capability of modern processors. Experiments demonstrate that the proposed IrGEMM framework can achieve significant performance improvements for irregular GEMM on both ARMv8 and X86 CPUs compared to other mainstream BLAS libraries.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 9","pages":"1672-1689"},"PeriodicalIF":5.6,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sophisticated Orchestrating Concurrent DLRM Training on CPU/GPU Platform 在 CPU/GPU 平台上协调并行 DLRM 培训的复杂性
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-23 DOI: 10.1109/TPDS.2024.3432620
Rui Tian;Jiazhi Jiang;Jiangsu Du;Dan Huang;Yutong Lu
Recommendation systems are essential to the operation of the majority of internet services, with Deep Learning Recommendation Models (DLRMs) serving as a crucial component. However, due to distinct computation, data access, and memory usage characteristics of recommendation models, the trainning of DLRMs may suffer from low resource utilization on prevalent heterogeneous CPU-GPU hardware platforms. Furthermore, as the majority of high-performance computing systems presently depend on multi-GPU computing nodes, the challenge of addressing low resource utilization becomes even more pronounced. Existing concurrent training solutions cannot be straightforwardly applied to DLRM due to various factors, such as insufficient fine-grained memory management and the lack of collaborative CPU-GPU scheduling. In this paper, we introduce RMixer, a scheduling framework that addresses these challenges by providing an efficient job management and scheduling mechanism for DLRM training jobs on heterogeneous CPU-GPU platforms. To facilitate training co-location, we first estimate the peak memory consumption of each job. Additionally, we track and collect resource utilization for DLRM training jobs. Based on the information of computational patterns, a batched job dispatcher with dynamic resource-complementary scheduling policy is proposed to co-locate DLRM training jobs on CPU-GPU platform. Scheduling strategies for both intra-GPU and inter-GPU scenarios were meticulously devised, with a focus on thoroughly examining individual GPU resource utilization and achieving a balanced state across multiple GPUs. Experimental results demonstrate that our implementation achieved up to 5.3× and 7.5× higher throughput on single GPU and 4 GPU respectively for training jobs involving various recommendation models.
推荐系统对于大多数互联网服务的运行至关重要,而深度学习推荐模型(DLRM)则是其中的重要组成部分。然而,由于推荐模型具有不同的计算、数据访问和内存使用特性,在流行的异构 CPU-GPU 硬件平台上,DLRMs 的训练可能会出现资源利用率低的问题。此外,由于目前大多数高性能计算系统都依赖于多 GPU 计算节点,因此解决资源利用率低的难题变得更加突出。现有的并发训练解决方案由于各种因素无法直接应用于 DLRM,例如不够精细的内存管理和缺乏 CPU-GPU 协同调度。在本文中,我们介绍了 RMixer 这一调度框架,它通过为异构 CPU-GPU 平台上的 DLRM 训练作业提供高效的作业管理和调度机制来应对这些挑战。为促进训练协同定位,我们首先估算每个作业的峰值内存消耗。此外,我们还跟踪和收集 DLRM 训练作业的资源利用率。根据计算模式信息,我们提出了一种具有动态资源互补调度策略的批量作业调度器,用于在 CPU-GPU 平台上共同定位 DLRM 训练作业。我们精心设计了GPU内和GPU间的调度策略,重点是彻底检查单个GPU的资源利用率,并在多个GPU之间实现平衡状态。实验结果表明,对于涉及各种推荐模型的训练作业,我们的实现在单 GPU 和 4 GPU 上分别实现了高达 5.3 倍和 7.5 倍的吞吐量提升。
{"title":"Sophisticated Orchestrating Concurrent DLRM Training on CPU/GPU Platform","authors":"Rui Tian;Jiazhi Jiang;Jiangsu Du;Dan Huang;Yutong Lu","doi":"10.1109/TPDS.2024.3432620","DOIUrl":"10.1109/TPDS.2024.3432620","url":null,"abstract":"Recommendation systems are essential to the operation of the majority of internet services, with Deep Learning Recommendation Models (DLRMs) serving as a crucial component. However, due to distinct computation, data access, and memory usage characteristics of recommendation models, the trainning of DLRMs may suffer from low resource utilization on prevalent heterogeneous CPU-GPU hardware platforms. Furthermore, as the majority of high-performance computing systems presently depend on multi-GPU computing nodes, the challenge of addressing low resource utilization becomes even more pronounced. Existing concurrent training solutions cannot be straightforwardly applied to DLRM due to various factors, such as insufficient fine-grained memory management and the lack of collaborative CPU-GPU scheduling. In this paper, we introduce RMixer, a scheduling framework that addresses these challenges by providing an efficient job management and scheduling mechanism for DLRM training jobs on heterogeneous CPU-GPU platforms. To facilitate training co-location, we first estimate the peak memory consumption of each job. Additionally, we track and collect resource utilization for DLRM training jobs. Based on the information of computational patterns, a batched job dispatcher with dynamic resource-complementary scheduling policy is proposed to co-locate DLRM training jobs on CPU-GPU platform. Scheduling strategies for both intra-GPU and inter-GPU scenarios were meticulously devised, with a focus on thoroughly examining individual GPU resource utilization and achieving a balanced state across multiple GPUs. Experimental results demonstrate that our implementation achieved up to 5.3× and 7.5× higher throughput on single GPU and 4 GPU respectively for training jobs involving various recommendation models.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 11","pages":"2177-2192"},"PeriodicalIF":5.6,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepTM: Efficient Tensor Management in Heterogeneous Memory for DNN Training DeepTM:用于 DNN 训练的异构内存中的高效张量管理
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-22 DOI: 10.1109/TPDS.2024.3431910
Haoran Zhou;Wei Rang;Hongyang Chen;Xiaobo Zhou;Dazhao Cheng
Deep Neural Networks (DNNs) have gained widespread adoption in diverse fields, including image classification, object detection, and natural language processing. However, training large-scale DNN models often encounters significant memory bottlenecks, which ask for efficient management of extensive tensors. Heterogeneous memory system, which combines persistent memory (PM) modules with traditional DRAM, offers an economically viable solution to address tensor management challenges during DNN training. However, existing memory management methods on heterogeneous memory systems often lead to low PM access efficiency, low bandwidth utilization, and incomplete analysis of model characteristics. To overcome these hurdles, we introduce an efficient tensor management approach, DeepTM, tailored for heterogeneous memory to alleviate memory bottlenecks during DNN training. DeepTM employs page-level tensor aggregation to enhance PM read and write performance and executes contiguous page migration to increase memory bandwidth. Through an analysis of tensor access patterns and model characteristics, we quantify the overall performance and transform the performance optimization problem into the framework of Integer Linear Programming. Additionally, we achieve tensor heat recognition by dynamically adjusting the weights of four key tensor characteristics and develop a global optimization strategy using Deep Reinforcement Learning. To validate the efficacy of our approach, we implement and evaluate DeepTM, utilizing the TensorFlow framework running on a PM-based heterogeneous memory system. The experimental results demonstrate that DeepTM achieves performance improvements of up to 36% and 49% compared to the current state-of-the-art memory management strategies AutoTM and Sentinel, respectively. Furthermore, our solution reduces the overhead by 18 times and achieves up to 29% cost reduction compared to AutoTM.
深度神经网络(DNN)已在图像分类、物体检测和自然语言处理等多个领域得到广泛应用。然而,大规模 DNN 模型的训练往往会遇到严重的内存瓶颈,这就要求对大量的张量进行有效管理。异构内存系统结合了持久内存(PM)模块和传统 DRAM,为解决 DNN 训练过程中的张量管理难题提供了经济可行的解决方案。然而,异构内存系统上现有的内存管理方法往往导致持久内存访问效率低、带宽利用率低以及模型特性分析不完整。为了克服这些障碍,我们引入了一种专为异构内存定制的高效张量管理方法 DeepTM,以缓解 DNN 训练过程中的内存瓶颈。DeepTM 采用页面级张量聚合来提高 PM 读写性能,并执行连续页面迁移来增加内存带宽。通过分析张量访问模式和模型特征,我们量化了整体性能,并将性能优化问题转化为整数线性规划框架。此外,我们还通过动态调整四个关键张量特征的权重来实现张量热识别,并利用深度强化学习(Deep Reinforcement Learning)制定了全局优化策略。为了验证我们方法的有效性,我们利用在基于 PM 的异构存储系统上运行的 TensorFlow 框架,实施并评估了 DeepTM。实验结果表明,与当前最先进的内存管理策略 AutoTM 和 Sentinel 相比,DeepTM 的性能分别提高了 36% 和 49%。此外,与 AutoTM 相比,我们的解决方案将开销减少了 18 倍,成本降低了 29%。
{"title":"DeepTM: Efficient Tensor Management in Heterogeneous Memory for DNN Training","authors":"Haoran Zhou;Wei Rang;Hongyang Chen;Xiaobo Zhou;Dazhao Cheng","doi":"10.1109/TPDS.2024.3431910","DOIUrl":"10.1109/TPDS.2024.3431910","url":null,"abstract":"Deep Neural Networks (DNNs) have gained widespread adoption in diverse fields, including image classification, object detection, and natural language processing. However, training large-scale DNN models often encounters significant memory bottlenecks, which ask for efficient management of extensive tensors. Heterogeneous memory system, which combines persistent memory (PM) modules with traditional DRAM, offers an economically viable solution to address tensor management challenges during DNN training. However, existing memory management methods on heterogeneous memory systems often lead to low PM access efficiency, low bandwidth utilization, and incomplete analysis of model characteristics. To overcome these hurdles, we introduce an efficient tensor management approach, DeepTM, tailored for heterogeneous memory to alleviate memory bottlenecks during DNN training. DeepTM employs page-level tensor aggregation to enhance PM read and write performance and executes contiguous page migration to increase memory bandwidth. Through an analysis of tensor access patterns and model characteristics, we quantify the overall performance and transform the performance optimization problem into the framework of Integer Linear Programming. Additionally, we achieve tensor heat recognition by dynamically adjusting the weights of four key tensor characteristics and develop a global optimization strategy using Deep Reinforcement Learning. To validate the efficacy of our approach, we implement and evaluate DeepTM, utilizing the TensorFlow framework running on a PM-based heterogeneous memory system. The experimental results demonstrate that DeepTM achieves performance improvements of up to 36% and 49% compared to the current state-of-the-art memory management strategies AutoTM and Sentinel, respectively. Furthermore, our solution reduces the overhead by 18 times and achieves up to 29% cost reduction compared to AutoTM.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 11","pages":"1920-1935"},"PeriodicalIF":5.6,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1