首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
AdaptChain: Adaptive Data Sharing and Synchronization for NFV Systems on Heterogeneous Architectures AdaptChain:异构架构上 NFV 系统的自适应数据共享与同步
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-13 DOI: 10.1109/TPDS.2024.3400594
Kai Zhang;Jiahui Hong;Zhengying He;Yinan Jing;X. Sean Wang
In a Network Function Virtualization (NFV) system, network functions (NFs) are implemented on general-purpose hardware, including CPU, GPU, and FPGA. Studies have shown that there is no one-size-fits-all processor, as each processor demonstrates performance advantages to implement certain types of NFs. With more general-purpose processors such as GPUs being deployed in data center servers, the best practice to build a high-performance NFV service chain should employ available heterogeneous processors. However, current NFV systems fail to utilize these processors for acceleration. This is because, due to separate memory spaces, data synchronization is demanded to guarantee correctness, which can incur non-trivial overhead and result in low performance. This paper proposes AdaptChain, a data management facility that enables adaptive data sharing and synchronization for hybrid NFV systems on heterogeneous architectures. AdaptChain shares the host and device memory among NFs in a service chain. With adaptive synchronization plan generation and NF code adaptation, AdaptChain exploits three classes of opportunities to reduce the amount of synchronized data while guaranteeing correctness. Experimental results show that AdaptChain improves the overall throughput by up to 3.2× and reduces the latency by up to 52%.
在网络功能虚拟化(NFV)系统中,网络功能(NFs)是在通用硬件(包括 CPU、GPU 和 FPGA)上实现的。研究表明,没有放之四海而皆准的处理器,因为每种处理器在实现某些类型的网络功能时都具有性能优势。随着 GPU 等更多通用处理器被部署到数据中心服务器中,构建高性能 NFV 服务链的最佳做法应该是采用可用的异构处理器。然而,当前的 NFV 系统未能利用这些处理器进行加速。这是因为,由于存在独立的内存空间,为了保证正确性,需要进行数据同步,这可能会产生不小的开销,导致性能低下。本文提出的 AdaptChain 是一种数据管理设施,可为异构架构上的混合 NFV 系统实现自适应数据共享和同步。AdaptChain 在服务链中的 NF 之间共享主机和设备内存。通过自适应同步计划生成和 NF 代码自适应,AdaptChain 利用三类机会减少同步数据量,同时保证正确性。实验结果表明,AdaptChain的总体吞吐量提高了3.2倍,延迟降低了52%。
{"title":"AdaptChain: Adaptive Data Sharing and Synchronization for NFV Systems on Heterogeneous Architectures","authors":"Kai Zhang;Jiahui Hong;Zhengying He;Yinan Jing;X. Sean Wang","doi":"10.1109/TPDS.2024.3400594","DOIUrl":"10.1109/TPDS.2024.3400594","url":null,"abstract":"In a Network Function Virtualization (NFV) system, network functions (NFs) are implemented on general-purpose hardware, including CPU, GPU, and FPGA. Studies have shown that there is no one-size-fits-all processor, as each processor demonstrates performance advantages to implement certain types of NFs. With more general-purpose processors such as GPUs being deployed in data center servers, the best practice to build a high-performance NFV service chain should employ available heterogeneous processors. However, current NFV systems fail to utilize these processors for acceleration. This is because, due to separate memory spaces, data synchronization is demanded to guarantee correctness, which can incur non-trivial overhead and result in low performance. This paper proposes AdaptChain, a data management facility that enables adaptive data sharing and synchronization for hybrid NFV systems on heterogeneous architectures. AdaptChain shares the host and device memory among NFs in a service chain. With adaptive synchronization plan generation and NF code adaptation, AdaptChain exploits three classes of opportunities to reduce the amount of synchronized data while guaranteeing correctness. Experimental results show that AdaptChain improves the overall throughput by up to 3.2× and reduces the latency by up to 52%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140934474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS Targets 具有多个服务质量目标的流计算中的贝叶斯驱动自动扩展
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-13 DOI: 10.1109/TPDS.2024.3399834
Liang Zhang;Wenli Zheng;Kuangyu Zheng;Hongzi Zhu;Chao Li;Minyi Guo
Stream processing systems commonly work with auto-scaling to ensure resource efficiency and quality of service (QoS). Existing auto-scaling solutions lack accuracy in resource allocation because they rely on static QoS-resource models that fail to account for high workload variability and use indirect metrics with much distractive information. Moreover, different types of QoS metrics present different characteristics and thus need individual auto-scaling methods. In this paper, we propose a versatile auto-scaling solution for operator-level parallelism configuration, called AuTraScale+, to meet the throughput, processing-time latency, and event-time latency targets. AuTraScale+ follows the Bayesian optimization framework to make scaling decisions. First, it uses the Gaussian process model to eliminate the negative influence of uncertain factors on the performance model accuracy. Second, it leverages the expected improvement-based (EI-based) acquisition function to search and recommend the optimal configuration quickly. Besides, to make a more accurate scaling decision when the new model is not ready, AuTraScale+ proposes a transfer learning algorithm to estimate the benefits of all configurations at a new rate based on existing models and then recommend the optimal one. We implement and evaluate AuTraScale+ on the Flink platform. The experimental results on three representative workloads demonstrate that compared with the state-of-the-art methods, AuTraScale+ can reduce 66.6% and 36.7% resource consumption, respectively, in the scale-down and scale-up scenarios while achieving their throughput and processing-time latency targets. Compared with other methods of optimizing event-time latency, AuTraScale+ saves 26.9% of resources on average.
流处理系统通常采用自动缩放技术来确保资源效率和服务质量(QoS)。现有的自动缩放解决方案缺乏资源分配的准确性,因为它们依赖于静态的 QoS 资源模型,而这种模型无法考虑工作负载的高变化性,并且使用的是具有大量干扰信息的间接指标。此外,不同类型的 QoS 指标具有不同的特性,因此需要不同的自动缩放方法。在本文中,我们为操作员级并行性配置提出了一种多功能自动缩放解决方案,称为 AuTraScale+,以满足吞吐量、处理时间延迟和事件时间延迟目标。AuTraScale+ 采用贝叶斯优化框架做出扩展决策。首先,它使用高斯过程模型来消除不确定因素对性能模型准确性的负面影响。其次,它利用基于预期改进(EI)的获取函数,快速搜索并推荐最佳配置。此外,为了在新模型尚未准备就绪时做出更准确的扩展决策,AuTraScale+ 提出了一种迁移学习算法,在现有模型的基础上以新的速率估算所有配置的优势,然后推荐最优配置。我们在 Flink 平台上实现并评估了 AuTraScale+。在三个具有代表性的工作负载上的实验结果表明,与最先进的方法相比,AuTraScale+ 可以在缩放和放大场景中分别减少 66.6% 和 36.7% 的资源消耗,同时实现吞吐量和处理时间延迟目标。与其他优化事件时间延迟的方法相比,AuTraScale+ 平均可节省 26.9% 的资源。
{"title":"Bayesian-Driven Automated Scaling in Stream Computing With Multiple QoS Targets","authors":"Liang Zhang;Wenli Zheng;Kuangyu Zheng;Hongzi Zhu;Chao Li;Minyi Guo","doi":"10.1109/TPDS.2024.3399834","DOIUrl":"10.1109/TPDS.2024.3399834","url":null,"abstract":"Stream processing systems commonly work with auto-scaling to ensure resource efficiency and quality of service (QoS). Existing auto-scaling solutions lack accuracy in resource allocation because they rely on static QoS-resource models that fail to account for high workload variability and use indirect metrics with much distractive information. Moreover, different types of QoS metrics present different characteristics and thus need individual auto-scaling methods. In this paper, we propose a versatile auto-scaling solution for operator-level parallelism configuration, called AuTraScale+, to meet the throughput, processing-time latency, and event-time latency targets. AuTraScale+ follows the Bayesian optimization framework to make scaling decisions. First, it uses the Gaussian process model to eliminate the negative influence of uncertain factors on the performance model accuracy. Second, it leverages the expected improvement-based (EI-based) acquisition function to search and recommend the optimal configuration quickly. Besides, to make a more accurate scaling decision when the new model is not ready, AuTraScale+ proposes a transfer learning algorithm to estimate the benefits of all configurations at a new rate based on existing models and then recommend the optimal one. We implement and evaluate AuTraScale+ on the Flink platform. The experimental results on three representative workloads demonstrate that compared with the state-of-the-art methods, AuTraScale+ can reduce 66.6% and 36.7% resource consumption, respectively, in the scale-down and scale-up scenarios while achieving their throughput and processing-time latency targets. Compared with other methods of optimizing event-time latency, AuTraScale+ saves 26.9% of resources on average.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140934475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rollback-Free Recovery for a High Performance Dense Linear Solver With Reduced Memory Footprint 减少内存占用的高性能密集线性求解器的无回滚恢复功能
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-13 DOI: 10.1109/TPDS.2024.3400365
Daniela Loreti;Marcello Artioli;Anna Ciampolini
The scale of nowadays High Performance Computing (HPC) systems is the key element that determines the achievement of impressive performance, as well as the reason for their relatively limited reliability. Over the last decade, specific areas of the High Performance Computing (HPC) research field have addressed the issue at different levels, by enriching the infrastructure, the platforms, or the algorithms with fault tolerance features. In this work, we focus on the rather-pervasive task of computing the solution of a dense, unstructured linear system and we propose an algorithm-based technique to obtain fault tolerance to multiple anywhere-located faults during the parallel computation. We particularly study the ways to boost the performance of the rollback-free recovery, and we provide an extensive evaluation of our technique w.r.t. to other state-of-the-art algorithm-based methods.
当今高性能计算(HPC)系统的规模是决定其能否实现惊人性能的关键因素,也是其可靠性相对有限的原因。在过去十年中,高性能计算(HPC)研究领域的一些特定领域已经在不同层面上解决了这一问题,通过容错功能丰富了基础设施、平台或算法。在这项工作中,我们将重点放在计算密集、非结构化线性系统的解这一相当普遍的任务上,并提出了一种基于算法的技术,在并行计算过程中实现对多个任意位置故障的容错。我们特别研究了提高无回滚恢复性能的方法,并对我们的技术与其他最先进的基于算法的方法进行了广泛评估。
{"title":"Rollback-Free Recovery for a High Performance Dense Linear Solver With Reduced Memory Footprint","authors":"Daniela Loreti;Marcello Artioli;Anna Ciampolini","doi":"10.1109/TPDS.2024.3400365","DOIUrl":"10.1109/TPDS.2024.3400365","url":null,"abstract":"The scale of nowadays High Performance Computing (HPC) systems is the key element that determines the achievement of impressive performance, as well as the reason for their relatively limited reliability. Over the last decade, specific areas of the High Performance Computing (HPC) research field have addressed the issue at different levels, by enriching the infrastructure, the platforms, or the algorithms with fault tolerance features. In this work, we focus on the rather-pervasive task of computing the solution of a dense, unstructured linear system and we propose an algorithm-based technique to obtain fault tolerance to multiple anywhere-located faults during the parallel computation. We particularly study the ways to boost the performance of the rollback-free recovery, and we provide an extensive evaluation of our technique w.r.t. to other state-of-the-art algorithm-based methods.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10530061","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140934538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analytical Modeling and Throughput Computation of Blockchain Sharding 区块链分片的分析建模和吞吐量计算
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-12 DOI: 10.1109/TPDS.2024.3376452
Pourya Soltani;Farid Ashtiani
Sharding has shown great potential to scale out blockchains. It divides nodes into smaller groups which allow for partial transaction processing, relaying and storage. Hence, instead of running one blockchain, we will run multiple blockchains in parallel, and call each one a shard. Sharding can be applied to address shortcomings due to compulsory duplication of three resources in blockchains, i.e., computation, communication and storage. The most pressing issue in blockchains today is throughput. In this paper, we propose new queueing-theoretic models to derive the maximum throughput of sharded blockchains. We consider two cases, a fully sharded blockchain and a computation sharding. We model each with a queueing network that exploits signals to account for block production as well as multi-destination cross-shard transactions. We make sure quasi-reversibility for every queue in our models is satisfied so that they fall into the category of product-form queueing networks. We then obtain a closed-form solution for the maximum stable throughput of these systems with respect to block size, block rate, number of destinations in transactions and the number of shards. Comparing the results obtained from the two introduced sharding systems, we conclude that the extent of sharding in different domains plays a significant role in scalability.
分片技术在扩展区块链方面显示出巨大的潜力。它将节点分成较小的组,允许部分交易处理、中继和存储。因此,我们将并行运行多个区块链,而不是运行一个区块链,并将每个区块链称为一个分片。分片可用于解决区块链中三种资源(即计算、通信和存储)强制性重复造成的缺陷。目前,区块链中最紧迫的问题是吞吐量。在本文中,我们提出了新的队列理论模型来推导分片区块链的最大吞吐量。我们考虑了两种情况:完全分片区块链和计算分片。我们用队列网络为每种情况建模,利用信号来考虑区块生产以及多目的地跨分片交易。我们确保模型中的每个队列都满足准可逆性,这样它们就属于产品形式队列网络。然后,我们得到了这些系统的最大稳定吞吐量的闭式解,它与块大小、块速率、交易目的地数量和分片数量有关。通过比较两种引入的分片系统得出的结果,我们认为不同领域的分片程度对可扩展性起着重要作用。
{"title":"Analytical Modeling and Throughput Computation of Blockchain Sharding","authors":"Pourya Soltani;Farid Ashtiani","doi":"10.1109/TPDS.2024.3376452","DOIUrl":"10.1109/TPDS.2024.3376452","url":null,"abstract":"Sharding has shown great potential to scale out blockchains. It divides nodes into smaller groups which allow for partial transaction processing, relaying and storage. Hence, instead of running one blockchain, we will run multiple blockchains in parallel, and call each one a shard. Sharding can be applied to address shortcomings due to compulsory duplication of three resources in blockchains, i.e., computation, communication and storage. The most pressing issue in blockchains today is throughput. In this paper, we propose new queueing-theoretic models to derive the maximum throughput of sharded blockchains. We consider two cases, a fully sharded blockchain and a computation sharding. We model each with a queueing network that exploits signals to account for block production as well as multi-destination cross-shard transactions. We make sure quasi-reversibility for every queue in our models is satisfied so that they fall into the category of product-form queueing networks. We then obtain a closed-form solution for the maximum stable throughput of these systems with respect to block size, block rate, number of destinations in transactions and the number of shards. Comparing the results obtained from the two introduced sharding systems, we conclude that the extent of sharding in different domains plays a significant role in scalability.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140126411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost AutoDDL:带宽成本接近最优的自动分布式深度学习
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-07 DOI: 10.1109/TPDS.2024.3397800
Jinfan Chen;Shigang Li;Ran Guo;Jinhui Yuan;Torsten Hoefler
Recent advances in deep learning are driven by the growing scale of computation, data, and models. However, efficiently training large-scale models on distributed systems requires an intricate combination of data, operator, and pipeline parallelism, which exerts heavy burden on machine learning practitioners. To this end, we propose AutoDDL, a distributed training framework that automatically explores and exploits new parallelization schemes with near-optimal bandwidth cost. AutoDDL facilitates the description and implementation of different schemes by utilizing OneFlow's Split, Broadcast, and Partial Sum (SBP) abstraction. AutoDDL is equipped with an analytical performance model combined with a customized Coordinate Descent algorithm, which significantly reduces the scheme searching overhead. We conduct evaluations on Multi-Node-Single-GPU and Multi-Node-Multi-GPU machines using different models, including VGG and Transformer. Compared to the expert-optimized implementations, AutoDDL reduces the end-to-end training time by up to 31.1% and 10% for Transformer and up to 17.7% and 71.5% for VGG on the two parallel systems, respectively.
计算、数据和模型规模的不断扩大推动了深度学习的最新进展。然而,在分布式系统上高效训练大规模模型需要数据、运算器和管道并行性的复杂组合,这给机器学习从业者带来了沉重的负担。为此,我们提出了分布式训练框架 AutoDDL,它能以接近最优的带宽成本自动探索和利用新的并行化方案。AutoDDL 利用 OneFlow 的拆分、广播和部分求和(SBP)抽象,为不同方案的描述和实施提供了便利。AutoDDL 配备了分析性能模型和定制的坐标下降算法,可显著降低方案搜索开销。我们使用不同的模型(包括 VGG 和 Transformer)在多节点-单 GPU 和多节点-多 GPU 机器上进行了评估。与专家优化的实现相比,AutoDDL 在两个并行系统上的端到端训练时间分别缩短了 Transformer 的 31.1% 和 10%,VGG 的 17.7% 和 71.5%。
{"title":"AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost","authors":"Jinfan Chen;Shigang Li;Ran Guo;Jinhui Yuan;Torsten Hoefler","doi":"10.1109/TPDS.2024.3397800","DOIUrl":"10.1109/TPDS.2024.3397800","url":null,"abstract":"Recent advances in deep learning are driven by the growing scale of computation, data, and models. However, efficiently training large-scale models on distributed systems requires an intricate combination of data, operator, and pipeline parallelism, which exerts heavy burden on machine learning practitioners. To this end, we propose AutoDDL, a distributed training framework that automatically explores and exploits new parallelization schemes with near-optimal bandwidth cost. AutoDDL facilitates the description and implementation of different schemes by utilizing OneFlow's \u0000<italic>Split</i>\u0000, \u0000<italic>Broadcast</i>\u0000, and \u0000<italic>Partial Sum</i>\u0000 (SBP) abstraction. AutoDDL is equipped with an analytical performance model combined with a customized Coordinate Descent algorithm, which significantly reduces the scheme searching overhead. We conduct evaluations on Multi-Node-Single-GPU and Multi-Node-Multi-GPU machines using different models, including VGG and Transformer. Compared to the expert-optimized implementations, AutoDDL reduces the end-to-end training time by up to 31.1% and 10% for Transformer and up to 17.7% and 71.5% for VGG on the two parallel systems, respectively.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140934597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph-Centric Performance Analysis for Large-Scale Parallel Applications 以图为中心的大规模并行应用性能分析
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-06 DOI: 10.1109/TPDS.2024.3396849
Yuyang Jin;Haojie Wang;Runxin Zhong;Chen Zhang;Xia Liao;Feng Zhang;Jidong Zhai
Performance analysis is essential for understanding the performance behaviors of parallel programs and detecting performance bottlenecks. Whereas, complex interconnections across several types of performance bugs, as well as inter-process communications and data dependence, make efficient performance analysis even more difficult. Despite the fact that many performance tools have been developed, accurately identifying underlying performance bottlenecks for such complex scenarios requires specific in-depth analysis. Significant human efforts and analysis knowledge are often required to implement each specific analytic task. To alleviate the complexity of developing specific performance analytic tasks, we present a programmable performance analysis tool, called PerFlow. In PerFlow, a step-by-step performance analysis process is represented as an Analysis Flow Diagram, which is constructed with several performance analysis sub-tasks, namely passes, that can be defined by developers or provided by PerFlow’s built-in analysis pass library. Furthermore, we define a Performance Abstraction Graph to describe the performance behavior of a parallel program, where the edges indicate the interactions between parallel units, therefore the analytic sub-tasks are converted to graph analysis tasks. PerFlow provides plentiful Python APIs for developing analytic tasks. Several case studies of real-world applications with up to 700 K lines of code are used to demonstrate the effectiveness of PerFlow. The results indicate that PerFlow makes it much easier to implement specific performance analytic tasks, and these tasks are performed automatically and efficiently to detect underlying performance bottlenecks.
性能分析对于了解并行程序的性能行为和检测性能瓶颈至关重要。然而,跨越几类性能错误的复杂互连以及进程间通信和数据依赖性,使得高效的性能分析变得更加困难。尽管已经开发了许多性能工具,但要准确识别此类复杂情况下的潜在性能瓶颈,还需要进行具体的深入分析。要完成每项具体的分析任务,往往需要大量的人力和分析知识。为了减轻开发特定性能分析任务的复杂性,我们提出了一种名为 PerFlow 的可编程性能分析工具。在 PerFlow 中,一个逐步进行的性能分析流程被表示为一个分析流程图,该流程图由多个性能分析子任务(即通行证)构建而成,这些子任务可以由开发人员定义,也可以由 PerFlow 内置的分析通行证库提供。此外,我们定义了一个性能抽象图来描述并行程序的性能行为,其中的边表示并行单元之间的交互,因此分析子任务被转换为图分析任务。PerFlow 为开发分析任务提供了丰富的 Python 应用程序接口。为了证明 PerFlow 的有效性,我们对现实世界中代码行数多达 700 K 行的应用程序进行了案例研究。结果表明,PerFlow 使特定性能分析任务的实施变得更加容易,而且这些任务可以自动、高效地执行,以检测潜在的性能瓶颈。
{"title":"Graph-Centric Performance Analysis for Large-Scale Parallel Applications","authors":"Yuyang Jin;Haojie Wang;Runxin Zhong;Chen Zhang;Xia Liao;Feng Zhang;Jidong Zhai","doi":"10.1109/TPDS.2024.3396849","DOIUrl":"10.1109/TPDS.2024.3396849","url":null,"abstract":"Performance analysis is essential for understanding the performance behaviors of parallel programs and detecting performance bottlenecks. Whereas, complex interconnections across several types of performance bugs, as well as inter-process communications and data dependence, make efficient performance analysis even more difficult. Despite the fact that many performance tools have been developed, accurately identifying underlying performance bottlenecks for such complex scenarios requires specific in-depth analysis. Significant human efforts and analysis knowledge are often required to implement each specific analytic task. To alleviate the complexity of developing specific performance analytic tasks, we present a programmable performance analysis tool, called \u0000<sc>PerFlow</small>\u0000. In \u0000<sc>PerFlow</small>\u0000, a step-by-step performance analysis process is represented as an Analysis Flow Diagram, which is constructed with several performance analysis sub-tasks, namely passes, that can be defined by developers or provided by \u0000<sc>PerFlow</small>\u0000’s built-in analysis pass library. Furthermore, we define a Performance Abstraction Graph to describe the performance behavior of a parallel program, where the edges indicate the interactions between parallel units, therefore the analytic sub-tasks are converted to graph analysis tasks. \u0000<sc>PerFlow</small>\u0000 provides plentiful Python APIs for developing analytic tasks. Several case studies of real-world applications with up to 700 K lines of code are used to demonstrate the effectiveness of \u0000<sc>PerFlow</small>\u0000. The results indicate that \u0000<sc>PerFlow</small>\u0000 makes it much easier to implement specific performance analytic tasks, and these tasks are performed automatically and efficiently to detect underlying performance bottlenecks.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FedREM: Guided Federated Learning in the Presence of Dynamic Device Unpredictability FedREM:动态设备不可预测性指导下的联合学习
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-06 DOI: 10.1109/TPDS.2024.3396133
Linsi Lan;Junbo Wang;Zhi Li;Krishna Kant;Wanquan Liu
Federated learning (FL) is a promising distributed machine learning scheme where multiple clients collaborate by sharing a common learning model while maintaining their private data locally. It can be applied to a lot of applications, e.g., training an automatic driving system by the perception of multiple vehicles. However, some clients may join the training system dynamically, which affects the stability and accuracy of the learning system a IoT. Meanwhile, data heterogeneity in the FL system exacerbates the above problem further due to imbalanced data distribution. To solve the above problems, we propose a novel FL framework named FedREM (Retain-Expansion and Matching), which guides clients training models by two mechanisms. They are 1) a Retain-Expansion mechanism that can let clients perform local training and extract data characteristics automatically during the training and 2) a Matching mechanism that can ensure new clients quickly adapt to the global model based on matching their data characteristics and adjusting the model accordingly. Results of extensive experiments verify that our FedREM outperforms various baselines in terms of model accuracy, communication efficiency, and system robustness.
联合学习(FL)是一种前景广阔的分布式机器学习方案,多个客户端通过共享一个共同的学习模型进行协作,同时在本地维护各自的私人数据。它可以应用于很多领域,例如通过感知多辆汽车来训练自动驾驶系统。然而,一些客户端可能会动态加入训练系统,这就会影响物联网学习系统的稳定性和准确性。同时,FL 系统中的数据异构会因数据分布不平衡而进一步加剧上述问题。为了解决上述问题,我们提出了一种名为 FedREM(保留-扩展和匹配)的新型 FL 框架,它通过两种机制引导客户训练模型。这两种机制分别是:1)保留-扩展机制,可让客户端执行局部训练,并在训练过程中自动提取数据特征;2)匹配机制,可确保新客户端在匹配其数据特征的基础上快速适应全局模型,并相应地调整模型。大量实验结果验证了我们的 FedREM 在模型准确性、通信效率和系统鲁棒性方面优于各种基线。
{"title":"FedREM: Guided Federated Learning in the Presence of Dynamic Device Unpredictability","authors":"Linsi Lan;Junbo Wang;Zhi Li;Krishna Kant;Wanquan Liu","doi":"10.1109/TPDS.2024.3396133","DOIUrl":"10.1109/TPDS.2024.3396133","url":null,"abstract":"Federated learning (FL) is a promising distributed machine learning scheme where multiple clients collaborate by sharing a common learning model while maintaining their private data locally. It can be applied to a lot of applications, e.g., training an automatic driving system by the perception of multiple vehicles. However, some clients may join the training system dynamically, which affects the stability and accuracy of the learning system a IoT. Meanwhile, data heterogeneity in the FL system exacerbates the above problem further due to imbalanced data distribution. To solve the above problems, we propose a novel FL framework named FedREM (Retain-Expansion and Matching), which guides clients training models by two mechanisms. They are 1) a Retain-Expansion mechanism that can let clients perform local training and extract data characteristics automatically during the training and 2) a Matching mechanism that can ensure new clients quickly adapt to the global model based on matching their data characteristics and adjusting the model accordingly. Results of extensive experiments verify that our FedREM outperforms various baselines in terms of model accuracy, communication efficiency, and system robustness.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores 在多核上优化多网格预处理共轭梯度法
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-05 DOI: 10.1109/TPDS.2024.3372473
Fan Yuan;Xiaojian Yang;Shengguo Li;Dezun Dong;Chun Huang;Zheng Wang
Multigrid preconditioned conjugate gradient (MGPCG) is commonly used in high-performance computing (HPC) workloads. However, MGPCG is notoriously challenging to optimize since most of its computation kernels are memory-bounded with low arithmetic intensity and non-trivial communication patterns among parallel processes. This article presents new techniques to improve the data locality and reduce the communication overhead of MGPCG by first merging the kernels of multigrid (MG). We then develop an asynchronous neighboring communication algorithm to reduce the data communications across parallel processes. We demonstrated the benefits of our approach by applying it to the high-performance conjugate gradient (HPCG) benchmark and integrating it with a real-life algebraic multigrid package. We test the resulting software implementations on three ARMv8 and one Intel Xeon system. Experimental results show that our approach leads to a 1.62x-2.54x speedup over the engineer- and vendor-tuned HPCG implementations across various workloads and platforms.
多网格预条件共轭梯度(MGPCG)常用于高性能计算(HPC)工作负载。然而,由于 MGPCG 的大多数计算内核都有内存限制,算术强度较低,且并行进程之间的通信模式并不复杂,因此优化 MGPCG 的难度可想而知。本文介绍了通过首先合并多网格(MG)内核来提高 MGPCG 的数据局部性和减少通信开销的新技术。然后,我们开发了一种异步相邻通信算法,以减少并行进程间的数据通信。我们将这种方法应用于高性能共轭梯度(HPCG)基准,并将其与现实生活中的代数多网格软件包集成,从而展示了这种方法的优势。我们在三个 ARMv8 和一个英特尔至强系统上测试了由此产生的软件实现。实验结果表明,在各种工作负载和平台上,我们的方法比工程师和供应商调整的 HPCG 实现速度提高了 1.62 倍至 2.54 倍。
{"title":"Optimizing Multi-Grid Preconditioned Conjugate Gradient Method on Multi-Cores","authors":"Fan Yuan;Xiaojian Yang;Shengguo Li;Dezun Dong;Chun Huang;Zheng Wang","doi":"10.1109/TPDS.2024.3372473","DOIUrl":"10.1109/TPDS.2024.3372473","url":null,"abstract":"Multigrid preconditioned conjugate gradient (MGPCG) is commonly used in high-performance computing (HPC) workloads. However, MGPCG is notoriously challenging to optimize since most of its computation kernels are memory-bounded with low arithmetic intensity and non-trivial communication patterns among parallel processes. This article presents new techniques to improve the data locality and reduce the communication overhead of MGPCG by first merging the kernels of multigrid (MG). We then develop an asynchronous neighboring communication algorithm to reduce the data communications across parallel processes. We demonstrated the benefits of our approach by applying it to the high-performance conjugate gradient (HPCG) benchmark and integrating it with a real-life algebraic multigrid package. We test the resulting software implementations on three ARMv8 and one Intel Xeon system. Experimental results show that our approach leads to a 1.62x-2.54x speedup over the engineer- and vendor-tuned HPCG implementations across various workloads and platforms.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140044652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting PM-Based B$^{+}$+-Tree With Persistent CPU Cache 重新审视基于 PM 的 B$^{+}$ 树与持久 CPU 高速缓存
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-05 DOI: 10.1109/TPDS.2024.3372621
Bowen Zhang;Shengan Zheng;Liangxu Nie;Zhenlin Qi;Hongyi Chen;Linpeng Huang;Hong Mei
Persistent memory (PM) promises near-DRAM performance as well as data persistence. Recently, a new feature called eADR is available for PM-equipped platforms to guarantee the persistence of CPU cache. The emergence of eADR presents unique opportunities to build lock-free data structures and unleash the full potential of PM. In this paper, we propose NBTree, a lock-free PM-friendly B$^+$-Tree, to deliver high scalability and low PM overhead. To our knowledge, NBTree is the first persistent index designed for PM systems with persistent CPU cache. To achieve lock-free, NBTree uses atomic primitives to serialize index operations. Moreover, NBTree proposes five novel techniques to enable lock-free accesses during structural modification operations (SMO), including three-phase SMO, sync-on-write, sync-on-read, cooperative SMO, and shift-aware search. To reduce PM access overhead, NBTree employs a decoupled leaf node design to absorb the metadata accesses in DRAM. Moreover, NBTree devises a cache-crafty persistent allocator and adopts log-structured insert and in-place update/delete to enhance the access locality of write operations, absorbing a substantial amount of PM writes in persistent CPU cache. Our evaluation shows that NBTree achieves up to 11× higher throughput and 43× lower 99% tail latency than state-of-the-art persistent B$^+$-Trees under YCSB workloads.
持久内存(PM)具有接近 DRAM 的性能和数据持久性。最近,一种名为 eADR 的新功能可用于配备 PM 的平台,以保证 CPU 高速缓存的持久性。eADR 的出现为构建无锁数据结构和释放 PM 的全部潜力提供了独特的机会。在本文中,我们提出了一种无锁的 PM友好型 B$^+$ 树--NBTree,以提供高可扩展性和低 PM 开销。据我们所知,NBTree 是第一个为带有持久 CPU 缓存的 PM 系统设计的持久索引。为了实现无锁,NBTree 使用原子基元来序列化索引操作。此外,NBTree 还提出了五种新技术来实现结构修改操作(SMO)期间的无锁访问,包括三相 SMO、写同步、读同步、合作 SMO 和移位感知搜索。为减少 PM 访问开销,NBTree 采用了解耦叶节点设计,以吸收 DRAM 中的元数据访问。此外,NBTree 还设计了一个高速缓存持久分配器,并采用日志结构插入和就地更新/删除来增强写操作的访问本地性,从而在持久 CPU 高速缓存中吸收了大量的 PM 写入。我们的评估结果表明,在 YCSB 工作负载下,NBTree 比最先进的持久性 B$^+$-Trees 高出 11 倍的吞吐量,比 99% 的尾部延迟低 43 倍。
{"title":"Revisiting PM-Based B$^{+}$+-Tree With Persistent CPU Cache","authors":"Bowen Zhang;Shengan Zheng;Liangxu Nie;Zhenlin Qi;Hongyi Chen;Linpeng Huang;Hong Mei","doi":"10.1109/TPDS.2024.3372621","DOIUrl":"10.1109/TPDS.2024.3372621","url":null,"abstract":"Persistent memory (PM) promises near-DRAM performance as well as data persistence. Recently, a new feature called eADR is available for PM-equipped platforms to guarantee the persistence of CPU cache. The emergence of eADR presents unique opportunities to build lock-free data structures and unleash the full potential of PM. In this paper, we propose NBTree, a lock-free PM-friendly B\u0000<inline-formula><tex-math>$^+$</tex-math></inline-formula>\u0000-Tree, to deliver high scalability and low PM overhead. To our knowledge, NBTree is the first persistent index designed for PM systems with persistent CPU cache. To achieve lock-free, NBTree uses atomic primitives to serialize index operations. Moreover, NBTree proposes five novel techniques to enable lock-free accesses during structural modification operations (SMO), including \u0000<i>three-phase SMO</i>\u0000, \u0000<i>sync-on-write</i>\u0000, \u0000<i>sync-on-read</i>\u0000, \u0000<i>cooperative SMO</i>\u0000, and \u0000<i>shift-aware search</i>\u0000. To reduce PM access overhead, NBTree employs a decoupled leaf node design to absorb the metadata accesses in DRAM. Moreover, NBTree devises a cache-crafty persistent allocator and adopts \u0000<i>log-structured insert</i>\u0000 and \u0000<i>in-place update/delete</i>\u0000 to enhance the access locality of write operations, absorbing a substantial amount of PM writes in persistent CPU cache. Our evaluation shows that NBTree achieves up to 11× higher throughput and 43× lower 99% tail latency than state-of-the-art persistent B\u0000<inline-formula><tex-math>$^+$</tex-math></inline-formula>\u0000-Trees under YCSB workloads.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140044416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FHVAC: Feature-Level Hybrid Video Adaptive Configuration for Machine-Centric Live Streaming FHVAC:面向以机器为中心的实时流媒体的特征级混合视频自适应配置
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-04 DOI: 10.1109/TPDS.2024.3372046
Yuanhong Zhang;Weizhan Zhang;Haipeng Du;Caixia Yan;Li Liu;Qinghua Zheng
With the widespread deployment of edge computing, the focus has shifted to machine-centric live video streaming, where endpoint-collected videos are transmitted over networks to edge servers for analysis. Unlike maximizing user's Quality of Experience (QoE), machine-centric video streaming optimizes the machine's Quality of Inference (QoI) by balancing the inference accuracy, inference delay, and transmission latency with video adaptive configuration. Traditional heuristic configuration adaption methods are reliable but unable to respond to erratic network fluctuations. Reinforcement learning (RL) based algorithms exhibit superior flexibility but suffer from exploration mechanisms, resulting in long-tail effects on upload latency. In this paper, we propose FHVAC, which dynamically selects video encoding parameters for live streaming by coherently fusing rule-based and RL-based agent at the feature level. We initially develop a robust rule-based approach for ensuring the low latency in transmission, and employ imitation learning to convert it into a neural network equivalently. Subsequently, we design a novel module to combine the two approaches and assess various fusion mechanisms. Our evaluation of FHVAC across two vision tasks (pose estimation and semantic segmentation) in two scenarios (trace-driven simulation and testbed-based experiment) shows that FHVAC enhances the average QoI, and reduces 10.61%-65.27% latency tail performance compared to prior work.
随着边缘计算的广泛部署,重点已转移到以机器为中心的实时视频流,即通过网络将终端收集的视频传输到边缘服务器进行分析。与最大限度提高用户体验质量(QoE)不同,以机器为中心的视频流通过视频自适应配置平衡推理精度、推理延迟和传输延迟来优化机器的推理质量(QoI)。传统的启发式配置自适应方法虽然可靠,但无法应对不稳定的网络波动。基于强化学习(RL)的算法具有出色的灵活性,但受到探索机制的影响,导致上传延迟产生长尾效应。在本文中,我们提出了 FHVAC,它通过在特征级别上协调融合基于规则和基于 RL 的代理,动态选择直播流媒体的视频编码参数。我们首先开发了一种基于规则的稳健方法,以确保低延迟传输,并利用模仿学习将其等效转换为神经网络。随后,我们设计了一个新模块,将两种方法结合起来,并评估了各种融合机制。我们在两种场景(基于轨迹的模拟和基于试验台的实验)中对 FHVAC 在两个视觉任务(姿势估计和语义分割)中的应用进行了评估,结果表明,与之前的研究相比,FHVAC 提高了平均 QoI,并降低了 10.61%-65.27% 的延迟尾随性能。
{"title":"FHVAC: Feature-Level Hybrid Video Adaptive Configuration for Machine-Centric Live Streaming","authors":"Yuanhong Zhang;Weizhan Zhang;Haipeng Du;Caixia Yan;Li Liu;Qinghua Zheng","doi":"10.1109/TPDS.2024.3372046","DOIUrl":"10.1109/TPDS.2024.3372046","url":null,"abstract":"With the widespread deployment of edge computing, the focus has shifted to machine-centric live video streaming, where endpoint-collected videos are transmitted over networks to edge servers for analysis. Unlike maximizing user's Quality of Experience (QoE), machine-centric video streaming optimizes the machine's Quality of Inference (QoI) by balancing the inference accuracy, inference delay, and transmission latency with video adaptive configuration. Traditional heuristic configuration adaption methods are reliable but unable to respond to erratic network fluctuations. Reinforcement learning (RL) based algorithms exhibit superior flexibility but suffer from exploration mechanisms, resulting in long-tail effects on upload latency. In this paper, we propose FHVAC, which dynamically selects video encoding parameters for live streaming by coherently fusing rule-based and RL-based agent at the feature level. We initially develop a robust rule-based approach for ensuring the low latency in transmission, and employ imitation learning to convert it into a neural network equivalently. Subsequently, we design a novel module to combine the two approaches and assess various fusion mechanisms. Our evaluation of FHVAC across two vision tasks (pose estimation and semantic segmentation) in two scenarios (trace-driven simulation and testbed-based experiment) shows that FHVAC enhances the average QoI, and reduces 10.61%-65.27% latency tail performance compared to prior work.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140032817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1