首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
Proteus: Simulating the Performance of Distributed DNN Training Proteus:模拟分布式 DNN 训练的性能
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-08-14 DOI: 10.1109/TPDS.2024.3443255
Jiangfei Duan;Xiuhong Li;Ping Xu;Xingcheng Zhang;Shengen Yan;Yun Liang;Dahua Lin
DNN models are becoming increasingly larger to achieve unprecedented accuracy, and the accompanying increased computation and memory requirements necessitate the employment of massive clusters and elaborate parallelization strategies to accelerate DNN training. In order to better optimize the performance and analyze the cost, it is indispensable to model the training throughput of distributed DNN training. However, complex parallelization strategies and the resulting complex runtime behaviors make it challenging to construct an accurate performance model. In this article, we present Proteus, the first standalone simulator to model the performance of complex parallelization strategies through simulation execution. Proteus first models complex parallelization strategies with a unified representation named Strategy Tree. Then, it compiles the strategy tree into a distributed execution graph and simulates the complex runtime behaviors, comp-comm overlap and bandwidth sharing, with a Hierarchical Topo-Aware Executor (HTAE). We finally evaluate Proteus across a wide variety of DNNs on three hardware configurations. Experimental results show that Proteus achieves 3.0% average prediction error and preserves order for training throughput of various parallelization strategies. Compared to state-of-the-art approaches, Proteus reduces prediction error by up to 133.8%.
为了达到前所未有的精确度,DNN 模型变得越来越大,随之而来的计算和内存要求也越来越高,因此有必要使用大规模集群和精心设计的并行化策略来加速 DNN 训练。为了更好地优化性能和分析成本,建立分布式 DNN 训练吞吐量模型是必不可少的。然而,复杂的并行化策略和由此产生的复杂运行时行为使得构建精确的性能模型变得十分困难。在本文中,我们将介绍 Proteus,它是第一个通过模拟执行对复杂并行化策略的性能进行建模的独立模拟器。Proteus 首先用名为 "策略树 "的统一表示法对复杂并行化策略进行建模。然后,它将策略树编译成分布式执行图,并通过分层拓扑感知执行器(HTAE)模拟复杂的运行时行为、计算-通信重叠和带宽共享。最后,我们在三种硬件配置上对各种 DNN 进行了 Proteus 评估。实验结果表明,Proteus 实现了 3.0% 的平均预测误差,并保持了各种并行化策略的训练吞吐量顺序。与最先进的方法相比,Proteus 最多可将预测误差降低 133.8%。
{"title":"Proteus: Simulating the Performance of Distributed DNN Training","authors":"Jiangfei Duan;Xiuhong Li;Ping Xu;Xingcheng Zhang;Shengen Yan;Yun Liang;Dahua Lin","doi":"10.1109/TPDS.2024.3443255","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3443255","url":null,"abstract":"DNN models are becoming increasingly larger to achieve unprecedented accuracy, and the accompanying increased computation and memory requirements necessitate the employment of massive clusters and elaborate parallelization strategies to accelerate DNN training. In order to better optimize the performance and analyze the cost, it is indispensable to model the training throughput of distributed DNN training. However, complex parallelization strategies and the resulting complex runtime behaviors make it challenging to construct an accurate performance model. In this article, we present Proteus, the first standalone simulator to model the performance of complex parallelization strategies through simulation execution. Proteus first models complex parallelization strategies with a unified representation named \u0000<italic>Strategy Tree</i>\u0000. Then, it compiles the strategy tree into a distributed execution graph and simulates the complex runtime behaviors, \u0000<italic>comp-comm overlap</i>\u0000 and \u0000<italic>bandwidth sharing</i>\u0000, with a \u0000<underline>H</u>\u0000ierarchical \u0000<underline>T</u>\u0000opo-\u0000<underline>A</u>\u0000ware \u0000<underline>E</u>\u0000xecutor (\u0000<italic>HTAE</i>\u0000). We finally evaluate Proteus across a wide variety of DNNs on three hardware configurations. Experimental results show that Proteus achieves 3.0% average prediction error and preserves order for training throughput of various parallelization strategies. Compared to state-of-the-art approaches, Proteus reduces prediction error by up to 133.8%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10636756","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142090713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Opca: Enabling Optimistic Concurrent Access for Multiple Users in Oblivious Data Storage Opca:在遗忘数据存储中实现多用户优化并发访问
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-08-12 DOI: 10.1109/TPDS.2024.3441623
Yuezhi Che;Dazhao Cheng;Xiao Wang;Rujia Wang
The challenges of data privacy and security posed by data outsourcing are becoming increasingly prevalent. Oblivious RAM (ORAM)-based oblivious data storage guarantees data confidentiality through data encryption and access pattern obfuscation. However, it suffers from performance degradation and low throughput. To address these issues, the concurrency of ORAM in a multi-user scenario has been explored. We investigate several existing concurrent oblivious data storage solutions and discover that a trusted proxy is used to serve concurrent accesses between users and storage, with processing locks involved in the proxy to ensure correctness and prevent conflicts. The proxy-based system is inherently prone to pessimistic concurrency control, and as the number of users grows, a proxy might become a performance bottleneck, causing significant delays. In this study, we propose Opca, a novel oblivious data storage framework that enables optimistic concurrent access. Opca refines the proxy design by temporally storing multiple versions of modified data with labeled timestamps, committing only the latest version to the storage during a separate processing period. Opca is implemented and evaluated in different real-world storage backends with a scalable number of users, and its performance is compared to alternative schemes. Opca outperforms the state-of-the-art concurrent oblivious storage system TaoStore, which relies on a similar system setting. Our results show that Opca can improve 3.77x throughput and reduce 73.5% response time.
数据外包带来的数据隐私和安全挑战越来越普遍。基于遗忘内存(ORAM)的遗忘数据存储通过数据加密和访问模式混淆来保证数据的机密性。然而,它存在性能下降和吞吐量低的问题。为了解决这些问题,我们探索了多用户情况下遗忘内存的并发性。我们研究了几种现有的并发遗忘数据存储解决方案,发现用户和存储之间的并发访问使用可信代理服务,代理中涉及处理锁,以确保正确性并防止冲突。基于代理的系统在本质上容易造成并发控制的悲观,随着用户数量的增加,代理可能会成为性能瓶颈,造成严重的延迟。在本研究中,我们提出了一种新型遗忘式数据存储框架 Opca,它可以实现乐观的并发访问。Opca 改进了代理设计,在时间上存储了多个带时间戳的修改数据版本,在单独的处理期间只将最新版本提交到存储中。Opca 在用户数量可扩展的不同实际存储后端中进行了实施和评估,并将其性能与其他方案进行了比较。Opca 的性能优于最先进的并发遗忘存储系统 TaoStore,后者依赖于类似的系统设置。结果表明,Opca 的吞吐量提高了 3.77 倍,响应时间缩短了 73.5%。
{"title":"Opca: Enabling Optimistic Concurrent Access for Multiple Users in Oblivious Data Storage","authors":"Yuezhi Che;Dazhao Cheng;Xiao Wang;Rujia Wang","doi":"10.1109/TPDS.2024.3441623","DOIUrl":"https://doi.org/10.1109/TPDS.2024.3441623","url":null,"abstract":"The challenges of data privacy and security posed by data outsourcing are becoming increasingly prevalent. Oblivious RAM (ORAM)-based oblivious data storage guarantees data confidentiality through data encryption and access pattern obfuscation. However, it suffers from performance degradation and low throughput. To address these issues, the concurrency of ORAM in a multi-user scenario has been explored. We investigate several existing concurrent oblivious data storage solutions and discover that a trusted proxy is used to serve concurrent accesses between users and storage, with processing locks involved in the proxy to ensure correctness and prevent conflicts. The proxy-based system is inherently prone to pessimistic concurrency control, and as the number of users grows, a proxy might become a performance bottleneck, causing significant delays. In this study, we propose Opca, a novel oblivious data storage framework that enables optimistic concurrent access. Opca refines the proxy design by temporally storing multiple versions of modified data with labeled timestamps, committing only the latest version to the storage during a separate processing period. Opca is implemented and evaluated in different real-world storage backends with a scalable number of users, and its performance is compared to alternative schemes. Opca outperforms the state-of-the-art concurrent oblivious storage system TaoStore, which relies on a similar system setting. Our results show that Opca can improve 3.77x throughput and reduce 73.5% response time.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142165005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNs 哪种耦合是最佳耦合?AIMC 瓦片接口和 CNN 负载平衡探索
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-08-02 DOI: 10.1109/TPDS.2024.3437657
Joshua Klein;Irem Boybat;Giovanni Ansaloni;Marina Zapater;David Atienza
Due to stringent energy and performance constraints, edge AI computing often employs heterogeneous systems that utilize both general-purpose CPUs and accelerators. Analog in-memory computing (AIMC) is a well-known AI inference solution that overcomes computational bottlenecks by performing matrix-vector multiplication operations (MVMs) in constant time. However, the tiles of AIMC-based accelerators are limited by the number of weights they can hold. State-of-the-art research often sizes neural networks to AIMC tiles (or vice-versa), but does not consider cases where AIMC tiles cannot cover the whole network due to lack of tile resources or the network size. In this work, we study the trade-offs of available AIMC tile resources, neural network coverage, AIMC tile proximity to compute resources, and multi-core load balancing techniques. We first perform a study of single-layer performance and energy scalability of AIMC tiles in the two most typical AIMC acceleration targets: dense/fully-connected layers and convolutional layers. This study guides the methodology with which we approach parameter allocation to AIMC tiles in the context of large edge neural networks, both where AIMC tiles are close to the CPU (tightly-coupled) and cannot share resources across the system, and where AIMC tiles are far from the CPU (loosely-coupled) and can employ workload stealing. We explore the performance and energy trends of six modern CNNs using different methods of load balancing for differently-coupled system configurations with variable AIMC tile resources. We show that, by properly distributing workloads, AIMC acceleration can be made highly effective even on under-provisioned systems. As an example, 5.9x speedup and 5.6x energy gains were measured on an 8-core system, for a 41% coverage of neural network parameters.
由于严格的能耗和性能限制,边缘人工智能计算通常采用同时使用通用 CPU 和加速器的异构系统。模拟内存计算(AIMC)是一种著名的人工智能推理解决方案,它通过在恒定时间内执行矩阵-向量乘法运算(MVM)来克服计算瓶颈。然而,基于 AIMC 的加速器所能容纳的权重数量有限。最先进的研究通常会将神经网络的大小调整为 AIMC 瓦片(反之亦然),但不会考虑 AIMC 瓦片因缺乏瓦片资源或网络大小而无法覆盖整个网络的情况。在这项工作中,我们研究了可用 AIMC 瓦片资源、神经网络覆盖率、AIMC 瓦片与计算资源的接近程度以及多核负载平衡技术之间的权衡。我们首先研究了 AIMC 瓦片在两个最典型的 AIMC 加速目标中的单层性能和能量可扩展性:密集/全连接层和卷积层。这项研究为我们在大型边缘神经网络中处理 AIMC 瓦片参数分配提供了方法论指导,在这种情况下,AIMC 瓦片靠近 CPU(紧密耦合),无法在整个系统中共享资源,而在 AIMC 瓦片远离 CPU(松散耦合)的情况下,则可以采用工作负载窃取。我们探索了六种现代 CNN 的性能和能耗趋势,这些 CNN 采用了不同的负载均衡方法,适用于 AIMC 瓦片资源可变的不同耦合系统配置。我们的研究表明,通过适当分配工作负载,即使在配置不足的系统中,AIMC 加速也能非常有效。例如,在神经网络参数覆盖率为 41% 的 8 核系统上,我们测得了 5.9 倍的速度提升和 5.6 倍的能量增益。
{"title":"Which Coupled is Best Coupled? An Exploration of AIMC Tile Interfaces and Load Balancing for CNNs","authors":"Joshua Klein;Irem Boybat;Giovanni Ansaloni;Marina Zapater;David Atienza","doi":"10.1109/TPDS.2024.3437657","DOIUrl":"10.1109/TPDS.2024.3437657","url":null,"abstract":"Due to stringent energy and performance constraints, edge AI computing often employs heterogeneous systems that utilize both general-purpose CPUs and accelerators. Analog in-memory computing (AIMC) is a well-known AI inference solution that overcomes computational bottlenecks by performing matrix-vector multiplication operations (MVMs) in constant time. However, the tiles of AIMC-based accelerators are limited by the number of weights they can hold. State-of-the-art research often sizes neural networks to AIMC tiles (or vice-versa), but does not consider cases where AIMC tiles cannot cover the whole network due to lack of tile resources or the network size. In this work, we study the trade-offs of available AIMC tile resources, neural network coverage, AIMC tile proximity to compute resources, and multi-core load balancing techniques. We first perform a study of single-layer performance and energy scalability of AIMC tiles in the two most typical AIMC acceleration targets: dense/fully-connected layers and convolutional layers. This study guides the methodology with which we approach parameter allocation to AIMC tiles in the context of large edge neural networks, both where AIMC tiles are close to the CPU (tightly-coupled) and cannot share resources across the system, and where AIMC tiles are far from the CPU (loosely-coupled) and can employ workload stealing. We explore the performance and energy trends of six modern CNNs using different methods of load balancing for differently-coupled system configurations with variable AIMC tile resources. We show that, by properly distributing workloads, AIMC acceleration can be made highly effective even on under-provisioned systems. As an example, 5.9x speedup and 5.6x energy gains were measured on an 8-core system, for a 41% coverage of neural network parameters.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Locality-Preserving Graph Traversal With Split Live Migration 利用分割实时迁移实现位置保护图遍历
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-08-02 DOI: 10.1109/TPDS.2024.3436828
Rong Chen;Xingda Wei;Xiating Xie;Haibo Chen
Graph models many real-world data like social, transportation, biology, and communication data. Hence, graph traversal including multi-hop or graph-walking queries has been the key operation atop graph stores. However, since different graph traversals may touch different sets of vertices, it is hard or even impossible to have a one-size-fits-all graph partitioning algorithm that preserves access locality for various graph traversal workloads. Meanwhile, prior shard-based migration faces a dilemma such that coarse-grained migration may incur more migration overhead over increased locality benefits, while fine-grained migration usually requires excessive metadata and incurs non-trivial maintenance costs. We present Pragh, an efficient locality-preserving live graph migration scheme for graph stores in the form of key-value pairs. The key idea of Pragh is a split migration model that only migrates values physically while retaining keys in the initial location. This allows fine-grained migration while avoiding the need to maintain excessive metadata. Pragh integrates an RDMA-friendly location cache from DrTM-KV to provide fully-localized access to migrated data and further makes a novel reuse of the cache replacement policy for lightweight monitoring. Pragh further supports evolving graphs through a check-and-forward mechanism to resolve the conflict between updates and migration of graph data. Evaluations on an 8-node RDMA-capable cluster (100 Gbps) using a representative graph traversal benchmark show that Pragh can increase the throughput by up to 19× and decrease the median latency by up to 94%, thanks to split live migration that eliminates 97% remote accesses. A port of split live migration to Wukong shows up to 2.53× throughput improvement on representative workloads like LUBM-10240, thanks to a reduction of 88% remote accesses. This further confirms the effectiveness and generality of Pragh. Finally, though Pragh focuses on RDMA-based graph traversal, we show its generality by extending it to support graph traversals under traditional networking. Evaluations on the graph traversal benchmarks and graph query workloads on the same cluster but with 10 Gbps TCP/IP network further confirm its effectiveness without RDMA. Specifically, when evaluating on the LUBM-10240, Wukong-TCP with Pragh can achieve up to 1.87× throughput improvement with a 56% decrease in remote accesses.
图是许多现实世界数据的模型,如社会、交通、生物和通信数据。因此,图遍历(包括多跳或图行走查询)一直是图存储的关键操作。然而,由于不同的图遍历可能会触及不同的顶点集,因此很难甚至不可能有一种放之四海而皆准的图分区算法,能为各种图遍历工作负载保留访问局部性。与此同时,之前基于分片的迁移也面临着两难选择,粗粒度迁移可能会带来更多迁移开销,而不是更多的本地性优势,而细粒度迁移通常需要过多的元数据,并产生非同小可的维护成本。我们提出了 Pragh,这是一种针对键值对形式图存储的高效本地性保护实时图迁移方案。Pragh 的关键理念是一种拆分迁移模型,只对值进行物理迁移,而将键保留在初始位置。这样既能实现细粒度迁移,又能避免维护过多的元数据。Pragh 整合了来自 DrTM-KV 的 RDMA 友好位置缓存,为迁移数据提供完全本地化的访问,并进一步对缓存替换策略进行了新颖的重用,以实现轻量级监控。Pragh 还通过检查和转发机制进一步支持演化图,以解决图数据更新和迁移之间的冲突。在一个支持 RDMA 的 8 节点集群(100 Gbps)上使用具有代表性的图形遍历基准进行的评估表明,Pragh 可将吞吐量提高 19 倍,将中位延迟降低 94%,这要归功于可消除 97% 远程访问的拆分实时迁移。在 LUBM-10240 等代表性工作负载上,由于减少了 88% 的远程访问,将拆分实时迁移移植到 "悟空 "后,吞吐量最多提高了 2.53 倍。这进一步证实了 Pragh 的有效性和通用性。最后,虽然 Pragh 专注于基于 RDMA 的图遍历,但我们通过扩展它来支持传统网络下的图遍历,从而展示了它的通用性。在使用 10 Gbps TCP/IP 网络的同一集群上对图遍历基准和图查询工作负载进行的评估进一步证实了 Pragh 在不使用 RDMA 的情况下的有效性。具体而言,在 LUBM-10240 上进行评估时,使用 Pragh 的 Wukong-TCP 可实现高达 1.87 倍的吞吐量改进,远程访问量减少了 56%。
{"title":"Locality-Preserving Graph Traversal With Split Live Migration","authors":"Rong Chen;Xingda Wei;Xiating Xie;Haibo Chen","doi":"10.1109/TPDS.2024.3436828","DOIUrl":"10.1109/TPDS.2024.3436828","url":null,"abstract":"Graph models many real-world data like social, transportation, biology, and communication data. Hence, graph traversal including multi-hop or graph-walking queries has been the key operation atop graph stores. However, since different graph traversals may touch different sets of vertices, it is hard or even impossible to have a one-size-fits-all graph partitioning algorithm that preserves access locality for various graph traversal workloads. Meanwhile, prior shard-based migration faces a dilemma such that coarse-grained migration may incur more migration overhead over increased locality benefits, while fine-grained migration usually requires excessive metadata and incurs non-trivial maintenance costs. We present Pragh, an efficient locality-preserving live graph migration scheme for graph stores in the form of key-value pairs. The key idea of Pragh is a split migration model that only migrates values physically while retaining keys in the initial location. This allows fine-grained migration while avoiding the need to maintain excessive metadata. Pragh integrates an RDMA-friendly location cache from DrTM-KV to provide fully-localized access to migrated data and further makes a novel reuse of the cache replacement policy for lightweight monitoring. Pragh further supports evolving graphs through a check-and-forward mechanism to resolve the conflict between updates and migration of graph data. Evaluations on an 8-node RDMA-capable cluster (100 Gbps) using a representative graph traversal benchmark show that Pragh can increase the throughput by up to 19× and decrease the median latency by up to 94%, thanks to split live migration that eliminates 97% remote accesses. A port of split live migration to Wukong shows up to 2.53× throughput improvement on representative workloads like LUBM-10240, thanks to a reduction of 88% remote accesses. This further confirms the effectiveness and generality of Pragh. Finally, though Pragh focuses on RDMA-based graph traversal, we show its generality by extending it to support graph traversals under traditional networking. Evaluations on the graph traversal benchmarks and graph query workloads on the same cluster but with 10 Gbps TCP/IP network further confirm its effectiveness without RDMA. Specifically, when evaluating on the LUBM-10240, Wukong-TCP with Pragh can achieve up to 1.87× throughput improvement with a 56% decrease in remote accesses.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Evolution Strategies With Multi-Level Learning for Large-Scale Black-Box Optimization 针对大规模黑箱优化的多级学习分布式进化策略
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-08-02 DOI: 10.1109/TPDS.2024.3437688
Qiqi Duan;Chang Shao;Guochen Zhou;Minghan Zhang;Qi Zhao;Yuhui Shi
In the post-Moore era, main performance gains of black-box optimizers are increasingly depending on parallelism, especially for large-scale optimization (LSO). Here we propose to parallelize the well-established covariance matrix adaptation evolution strategy (CMA-ES) and in particular its one latest LSO variant called limited-memory CMA-ES (LM-CMA). To achieve efficiency while approximating its powerful invariance property, we present a multilevel learning-based meta-framework for distributed LM-CMA. Owing to its hierarchically organized structure, Meta-ES is well-suited to implement our distributed meta-framework, wherein the outer-ES controls strategy parameters while all parallel inner-ESs run the serial LM-CMA with different settings. For the distribution mean update of the outer-ES, both the elitist and multi-recombination strategy are used in parallel to avoid stagnation and regression, respectively. To exploit spatiotemporal information, the global step-size adaptation combines Meta-ES with the parallel cumulative step-size adaptation. After each isolation time, our meta-framework employs both the structure and parameter learning strategy to combine aligned evolution paths for CMA reconstruction. Experiments on a set of large-scale benchmarking functions with memory-intensive evaluations, arguably reflecting many data-driven optimization problems, validate the benefits (e.g., effectiveness w.r.t. solution quality, and adaptability w.r.t. second-order learning) and costs of our meta-framework.
在后摩尔时代,黑盒优化器的主要性能提升越来越依赖于并行化,尤其是大规模优化(LSO)。在此,我们提议并行化成熟的协方差矩阵适应演化策略(CMA-ES),特别是其最新的 LSO 变体--有限内存 CMA-ES (LM-CMA)。为了在近似其强大不变性特性的同时提高效率,我们提出了一种基于多层次学习的分布式 LM-CMA 元框架。由于其分层组织结构,Meta-ES 非常适合实现我们的分布式元框架,其中外层 ES 控制策略参数,而所有并行的内层 ES 以不同的设置运行串行 LM-CMA。对于外层 ESP 的分布均值更新,将并行使用精英策略和多重组合策略,以分别避免停滞和回归。为了利用时空信息,全局步长适应将 Meta-ES 与并行累积步长适应相结合。在每次隔离时间之后,我们的元框架都会采用结构和参数学习策略,结合对齐的演化路径进行 CMA 重建。在一组大规模基准函数上进行的实验验证了我们元框架的优势(例如,在解决方案质量方面的有效性和在二阶学习方面的适应性)和成本,这些基准函数具有内存密集型评估,可以说反映了许多数据驱动的优化问题。
{"title":"Distributed Evolution Strategies With Multi-Level Learning for Large-Scale Black-Box Optimization","authors":"Qiqi Duan;Chang Shao;Guochen Zhou;Minghan Zhang;Qi Zhao;Yuhui Shi","doi":"10.1109/TPDS.2024.3437688","DOIUrl":"10.1109/TPDS.2024.3437688","url":null,"abstract":"In the post-Moore era, main performance gains of black-box optimizers are increasingly depending on parallelism, especially for large-scale optimization (LSO). Here we propose to parallelize the well-established covariance matrix adaptation evolution strategy (CMA-ES) and in particular its one latest LSO variant called limited-memory CMA-ES (LM-CMA). To achieve efficiency while approximating its powerful invariance property, we present a multilevel learning-based meta-framework for distributed LM-CMA. Owing to its hierarchically organized structure, Meta-ES is well-suited to implement our distributed meta-framework, wherein the outer-ES controls strategy parameters while all parallel inner-ESs run the serial LM-CMA with different settings. For the distribution mean update of the outer-ES, both the elitist and multi-recombination strategy are used in parallel to avoid stagnation and regression, respectively. To exploit spatiotemporal information, the global step-size adaptation combines Meta-ES with the parallel cumulative step-size adaptation. After each isolation time, our meta-framework employs both the structure and parameter learning strategy to combine aligned evolution paths for CMA reconstruction. Experiments on a set of large-scale benchmarking functions with memory-intensive evaluations, arguably reflecting many data-driven optimization problems, validate the benefits (e.g., effectiveness w.r.t. solution quality, and adaptability w.r.t. second-order learning) and costs of our meta-framework.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SR-FDIL: Synergistic Replay for Federated Domain-Incremental Learning SR-FDIL:联合领域增量学习的协同重放
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-08-02 DOI: 10.1109/TPDS.2024.3436874
Yichen Li;Wenchao Xu;Yining Qi;Haozhao Wang;Ruixuan Li;Song Guo
Federated Learning (FL) is to allow multiple clients to collaboratively train a model while keeping their data locally. However, existing FL approaches typically assume that the data in each client is static and fixed, which cannot account for incremental data with domain shift, leading to catastrophic forgetting on previous domains, particularly when clients are common edge devices that may lack enough storage to retain full samples of each domain. To tackle this challenge, we propose Federated Domain-Incremental Learning via Synergistic Replay (SR-FDIL), which alleviates catastrophic forgetting by coordinating all clients to cache samples and replay them. More specifically, when new data arrives, each client selects the cached samples based not only on their importance in the local dataset but also on their correlation with the global dataset. Moreover, to achieve a balance between learning new data and memorizing old data, we propose a novel client selection mechanism by jointly considering the importance of both old and new data. We conducted extensive experiments on several datasets of which the results demonstrate that SR-FDIL outperforms state-of-the-art methods by up to 4.05% in terms of average accuracy of all domains.
联合学习(FL)是允许多个客户端协同训练一个模型,同时在本地保存各自的数据。然而,现有的联合学习方法通常假定每个客户端的数据都是静态和固定的,这就无法解释域转移带来的数据增量,从而导致对先前域的灾难性遗忘,特别是当客户端是普通边缘设备时,可能缺乏足够的存储来保留每个域的完整样本。为了应对这一挑战,我们提出了通过协同重放进行联合域增量学习(SR-FDIL),通过协调所有客户端缓存样本并重放它们来缓解灾难性遗忘。更具体地说,当新数据到来时,每个客户端不仅会根据样本在本地数据集中的重要性,还会根据样本与全局数据集的相关性来选择缓存样本。此外,为了在学习新数据和记忆旧数据之间取得平衡,我们提出了一种新颖的客户端选择机制,即共同考虑新旧数据的重要性。我们在多个数据集上进行了广泛的实验,结果表明,SR-FDIL 在所有领域的平均准确率方面比最先进的方法高出 4.05%。
{"title":"SR-FDIL: Synergistic Replay for Federated Domain-Incremental Learning","authors":"Yichen Li;Wenchao Xu;Yining Qi;Haozhao Wang;Ruixuan Li;Song Guo","doi":"10.1109/TPDS.2024.3436874","DOIUrl":"10.1109/TPDS.2024.3436874","url":null,"abstract":"Federated Learning (FL) is to allow multiple clients to collaboratively train a model while keeping their data locally. However, existing FL approaches typically assume that the data in each client is static and fixed, which cannot account for incremental data with domain shift, leading to catastrophic forgetting on previous domains, particularly when clients are common edge devices that may lack enough storage to retain full samples of each domain. To tackle this challenge, we propose \u0000<bold>F</b>\u0000ederated \u0000<bold>D</b>\u0000omain-\u0000<bold>I</b>\u0000ncremental \u0000<bold>L</b>\u0000earning via \u0000<bold>S</b>\u0000ynergistic \u0000<bold>R</b>\u0000eplay (SR-FDIL), which alleviates catastrophic forgetting by coordinating all clients to cache samples and replay them. More specifically, when new data arrives, each client selects the cached samples based not only on their importance in the local dataset but also on their correlation with the global dataset. Moreover, to achieve a balance between learning new data and memorizing old data, we propose a novel client selection mechanism by jointly considering the importance of both old and new data. We conducted extensive experiments on several datasets of which the results demonstrate that SR-FDIL outperforms state-of-the-art methods by up to 4.05% in terms of average accuracy of all domains.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost-Effective and Robust Service Provisioning in Multi-Access Edge Computing 在多接入边缘计算中提供经济高效且稳健的服务
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-30 DOI: 10.1109/TPDS.2024.3435929
Zhengzhe Xiang;Yuhang Zheng;Dongjing Wang;Javid Taheri;Zengwei Zheng;Minyi Guo
With the development of multiaccess edge computing (MEC) technology, an increasing number of researchers and developers are deploying their computation-intensive and IO-intensive services (especially AI services) on edge devices. These devices, being close to end users, provide better performance in mobile environments. By constructing a service provisioning system at the network edge, latency is significantly reduced due to short-distance communication with edge servers. However, since the MEC-based service provisioning system is resource-sensitive and the network may be unstable, careful resource allocation and traffic scheduling strategies are essential. This paper investigates and quantifies the cost-effectiveness and robustness of the MEC-based service provisioning system with the applied resource allocation and traffic scheduling strategies. Based on this analysis, a cost-effective and robust service provisioning algorithm, termed CERA, is proposed to minimize deployment costs while maintaining system robustness. Extensive experiments are conducted to compare the proposed approach with well-known baseline algorithms and evaluate factors impacting the results. The findings demonstrate that CERA achieves at least 15.9% better performance than other baseline algorithms across various instances.
随着多访问边缘计算(MEC)技术的发展,越来越多的研究人员和开发人员正在边缘设备上部署计算密集型和 IO 密集型服务(尤其是人工智能服务)。这些设备靠近终端用户,能在移动环境中提供更好的性能。通过在网络边缘构建服务供应系统,与边缘服务器的短距离通信可显著降低延迟。然而,由于基于 MEC 的服务供应系统对资源敏感,而且网络可能不稳定,因此必须采取谨慎的资源分配和流量调度策略。本文通过应用资源分配和流量调度策略,研究并量化了基于 MEC 的服务供应系统的成本效益和稳健性。在此分析基础上,提出了一种成本效益高且稳健的服务供应算法(称为 CERA),以最大限度地降低部署成本,同时保持系统的稳健性。我们进行了广泛的实验,将所提出的方法与著名的基线算法进行比较,并对影响结果的因素进行评估。实验结果表明,在各种实例中,CERA 比其他基线算法至少提高了 15.9% 的性能。
{"title":"Cost-Effective and Robust Service Provisioning in Multi-Access Edge Computing","authors":"Zhengzhe Xiang;Yuhang Zheng;Dongjing Wang;Javid Taheri;Zengwei Zheng;Minyi Guo","doi":"10.1109/TPDS.2024.3435929","DOIUrl":"10.1109/TPDS.2024.3435929","url":null,"abstract":"With the development of multiaccess edge computing (MEC) technology, an increasing number of researchers and developers are deploying their computation-intensive and IO-intensive services (especially AI services) on edge devices. These devices, being close to end users, provide better performance in mobile environments. By constructing a service provisioning system at the network edge, latency is significantly reduced due to short-distance communication with edge servers. However, since the MEC-based service provisioning system is resource-sensitive and the network may be unstable, careful resource allocation and traffic scheduling strategies are essential. This paper investigates and quantifies the cost-effectiveness and robustness of the MEC-based service provisioning system with the applied resource allocation and traffic scheduling strategies. Based on this analysis, a \u0000<bold>c</b>\u0000ost-\u0000<bold>e</b>\u0000ffective and \u0000<bold>r</b>\u0000obust service provisioning \u0000<bold>a</b>\u0000lgorithm, termed \u0000<monospace>CERA</monospace>\u0000, is proposed to minimize deployment costs while maintaining system robustness. Extensive experiments are conducted to compare the proposed approach with well-known baseline algorithms and evaluate factors impacting the results. The findings demonstrate that \u0000<monospace>CERA</monospace>\u0000 achieves at least 15.9% better performance than other baseline algorithms across various instances.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy Preserving Task Push in Spatial Crowdsourcing With Unknown Popularity 在未知人气的空间众包中保护隐私的任务推送
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-29 DOI: 10.1109/TPDS.2024.3434978
Yin Xu;Mingjun Xiao;Jie Wu;He Sun
In this paper, we investigate the privacy-preserving task push problem with unknown popularity in Spatial Crowdsourcing (SC), where the platform needs to select some tasks with unknown popularity and push them to workers. Meanwhile, the preferences of workers and the popularity values of tasks might involve some sensitive information, which should be protected from disclosure. To address these concerns, we propose a Privacy Preserving Auction-based Bandit scheme, termed PPAB. Specifically, on the basis of the Combinatorial Multi-armed Bandit (CMAB) game, we first construct a Differentially Private Auction-based CMAB (DPA-CMAB) model. Under the DPA-CMAB model, we design a privacy-preserving arm-pulling policy based on Diffie-Hellman (DH), Differential Privacy (DP), and upper confidence bound, which includes the DH-based encryption mechanism and the hybrid DP-based protection mechanism. The policy not only can learn the popularity of tasks and make online task push decisions, but also can protect the popularity as well as workers’ preferences from being revealed. Meanwhile, we design an auction-based incentive mechanism to determine the payment for each selected task. Furthermore, we conduct an in-depth analysis of the security and online performance of PPAB, and prove that PPAB satisfies some desired properties (i.e., truthfulness, individual rationality, and computational efficiency). Finally, the significant performance of PPAB is confirmed through extensive simulations on the real-world dataset.
在空间众包(SC)中,平台需要选择一些未知人气的任务并将其推送给工人,本文研究了未知人气下的隐私保护任务推送问题。同时,工人的偏好和任务的受欢迎程度值可能涉及一些敏感信息,这些信息应防止泄露。为了解决这些问题,我们提出了一种基于竞价排名的隐私保护方案(Privacy Preserving Auction-based Bandit scheme),简称 PPAB。具体来说,在组合多臂匪徒(CMAB)博弈的基础上,我们首先构建了一个基于差分隐私拍卖的 CMAB(DPA-CMAB)模型。在 DPA-CMAB 模型下,我们设计了一种基于 Diffie-Hellman (DH)、Differential Privacy (DP) 和置信上限的隐私保护拉臂策略,其中包括基于 DH 的加密机制和基于 DP 的混合保护机制。该策略不仅能了解任务的受欢迎程度并做出在线任务推送决策,还能保护任务的受欢迎程度和工人的偏好不被泄露。同时,我们设计了一种基于拍卖的激励机制,以确定每个选定任务的报酬。此外,我们还对 PPAB 的安全性和在线性能进行了深入分析,并证明 PPAB 满足一些期望的特性(即真实性、个体理性和计算效率)。最后,通过在真实世界数据集上进行大量仿真,证实了 PPAB 的显著性能。
{"title":"Privacy Preserving Task Push in Spatial Crowdsourcing With Unknown Popularity","authors":"Yin Xu;Mingjun Xiao;Jie Wu;He Sun","doi":"10.1109/TPDS.2024.3434978","DOIUrl":"10.1109/TPDS.2024.3434978","url":null,"abstract":"In this paper, we investigate the privacy-preserving task push problem with unknown popularity in Spatial Crowdsourcing (SC), where the platform needs to select some tasks with unknown popularity and push them to workers. Meanwhile, the preferences of workers and the popularity values of tasks might involve some sensitive information, which should be protected from disclosure. To address these concerns, we propose a Privacy Preserving Auction-based Bandit scheme, termed PPAB. Specifically, on the basis of the Combinatorial Multi-armed Bandit (CMAB) game, we first construct a Differentially Private Auction-based CMAB (DPA-CMAB) model. Under the DPA-CMAB model, we design a privacy-preserving arm-pulling policy based on Diffie-Hellman (DH), Differential Privacy (DP), and upper confidence bound, which includes the DH-based encryption mechanism and the hybrid DP-based protection mechanism. The policy not only can learn the popularity of tasks and make online task push decisions, but also can protect the popularity as well as workers’ preferences from being revealed. Meanwhile, we design an auction-based incentive mechanism to determine the payment for each selected task. Furthermore, we conduct an in-depth analysis of the security and online performance of PPAB, and prove that PPAB satisfies some desired properties (i.e., truthfulness, individual rationality, and computational efficiency). Finally, the significant performance of PPAB is confirmed through extensive simulations on the real-world dataset.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A State-of-the-Art Review with Code about Connected Components Labeling on GPUs 用代码回顾 GPU 上连接组件标签的最新进展
IF 5.3 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-29 DOI: 10.1109/tpds.2024.3434357
Federico Bolelli, Stefano Allegretti, Luca Lumetti, Costantino Grana
{"title":"A State-of-the-Art Review with Code about Connected Components Labeling on GPUs","authors":"Federico Bolelli, Stefano Allegretti, Luca Lumetti, Costantino Grana","doi":"10.1109/tpds.2024.3434357","DOIUrl":"https://doi.org/10.1109/tpds.2024.3434357","url":null,"abstract":"","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141869362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSA: A Uniformly Recursive Bidirection-Sequence Systolic Sorter Array SSA:统一递归双向序列 Systolic Sorter 阵列
IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-07-26 DOI: 10.1109/TPDS.2024.3434332
Teng Gao;Lan Huang;Shang Gao;Kangping Wang
The use of reconfigurable circuits with parallel computing capabilities has been explored to enhance sorting performance and reduce power consumption. Nonetheless, most sorting algorithms utilizing dedicated processors are designed solely based on the parallelization of the algorithm, lacking considerations of specialized hardware structures. This leads to problems, including but not limited to the consumption of excessive I/O interface resources, on-chip storage resources, and complex layout wiring. In this paper, we propose a Systolic Sorter Array, implemented by a Uniform Recurrence Equation (URE) with highly parameterised in terms of data size, bit width and type. Leveraging this uniformly recursive structure, the sorter can simultaneously sort two independent sequences. In addition, we implemented global and local control modes on the FPGA to achieve higher computational frequencies. In our experiments, we have demonstrated the speed-up ratio of SSA relative to other state of the art (SOTA) sorting algorithms using C++ $std$::$sort()$ as benchmark. Inheriting the benefits from the Systolic Array architecture, the SSA reaches up to 810 Mhz computing frequency on the U200. The results of our study show that SSA outperforms other sorting algorithms in terms of throughput, speed-up ratio, and computation frequency.
人们一直在探索使用具有并行计算能力的可重构电路来提高排序性能和降低功耗。然而,大多数使用专用处理器的排序算法在设计时只考虑了算法的并行化,缺乏对专用硬件结构的考虑。这就导致了一些问题,包括但不限于消耗过多的 I/O 接口资源、片上存储资源和复杂的布局布线。在本文中,我们提出了一种通过统一递归方程(URE)实现的、在数据大小、位宽和类型方面高度参数化的 Systolic Sorter Array。利用这种均匀递归结构,分拣机可以同时对两个独立序列进行分拣。此外,我们还在 FPGA 上实现了全局和局部控制模式,以达到更高的计算频率。在实验中,我们以 C++ $std$::$sort()$ 为基准,展示了 SSA 相对于其他最新排序算法(SOTA)的加速比率。SSA 继承了 Systolic Array 架构的优点,在 U200 上的计算频率高达 810 Mhz。研究结果表明,SSA 在吞吐量、加速比和计算频率方面都优于其他排序算法。
{"title":"SSA: A Uniformly Recursive Bidirection-Sequence Systolic Sorter Array","authors":"Teng Gao;Lan Huang;Shang Gao;Kangping Wang","doi":"10.1109/TPDS.2024.3434332","DOIUrl":"10.1109/TPDS.2024.3434332","url":null,"abstract":"The use of reconfigurable circuits with parallel computing capabilities has been explored to enhance sorting performance and reduce power consumption. Nonetheless, most sorting algorithms utilizing dedicated processors are designed solely based on the parallelization of the algorithm, lacking considerations of specialized hardware structures. This leads to problems, including but not limited to the consumption of excessive I/O interface resources, on-chip storage resources, and complex layout wiring. In this paper, we propose a Systolic Sorter Array, implemented by a Uniform Recurrence Equation (URE) with highly parameterised in terms of data size, bit width and type. Leveraging this uniformly recursive structure, the sorter can simultaneously sort two independent sequences. In addition, we implemented global and local control modes on the FPGA to achieve higher computational frequencies. In our experiments, we have demonstrated the speed-up ratio of SSA relative to other state of the art (SOTA) sorting algorithms using C++ \u0000<inline-formula><tex-math>$std$</tex-math></inline-formula>\u0000::\u0000<inline-formula><tex-math>$sort()$</tex-math></inline-formula>\u0000 as benchmark. Inheriting the benefits from the Systolic Array architecture, the SSA reaches up to 810 Mhz computing frequency on the U200. The results of our study show that SSA outperforms other sorting algorithms in terms of throughput, speed-up ratio, and computation frequency.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141772288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1