首页 > 最新文献

IEEE Transactions on Parallel and Distributed Systems最新文献

英文 中文
Taking RNA-RNA Interaction to Machine Peak 将 RNA-RNA 相互作用提升到机器峰值
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-21 DOI: 10.1109/TPDS.2024.3380443
Chiranjeb Mondal;Sanjay Rajopadhye
RNA-RNA interactions (RRIs) are essential in many biological processes, including gene transcription, translation, and localization. They play a critical role in diseases such as cancer and Alzheimer’s. Algorithms to model RRI typically use dynamic programming and have the complexity $Theta (N^{3} , M^{3})$ in time and $Theta (N^{2} , M^{2})$ in space where $N$ and $M$ are the lengths of the two RNA sequences. This makes it both essential and challenging to parallelize them. Previous efforts to do so have been hand-optimized, which is prone to human error and costly to develop and maintain. This paper presents a multi-core CPU parallelization of BPMax, one of the simpler RRI algorithms, generated by a user-guided polyhedral code generation tool, AlphaZ. The user starts with a mathematical specification of the dynamic programming algorithm and provides the choice of polyhedral program transformations such as schedules, memory-maps, and multi-level tiling. AlphaZ automatically generates highly optimized code. At the lowest level, we implemented a small hand-optimized register-tiled “matrix max-plus” kernel and integrated it with our tool-generated optimized code. Our final optimized program version is about $400times$ faster than the base program, translating to around 312 GFLOPS, more than half of our platform's Roofline Machine Peak (RMP) performance. On a single core, we attain 80% of RMP. The main kernel in the algorithm, whose complexity is $Theta (N^{3} , M^{3})$, attains 58 GFLOPS on a single-core and 344 GFLOPS on multi-core (90% and 58% of RMP, respectively).
RNA-RNA 相互作用(RRIs)在许多生物过程中都至关重要,包括基因转录、翻译和定位。它们在癌症和阿尔茨海默氏症等疾病中起着至关重要的作用。建立 RRI 模型的算法通常使用动态编程,在时间上具有 $Theta (N^{3} , M^{3})$ 的复杂度,在空间上具有 $Theta (N^{2} , M^{2})$ 的复杂度,其中 $N$ 和 $M$ 是两个 RNA 序列的长度。因此,将它们并行化既重要又具有挑战性。以往的并行化工作都是手工优化,容易出现人为错误,而且开发和维护成本高昂。本文介绍了一种由用户引导的多面体代码生成工具 AlphaZ 生成的多核 CPU 并行化 BPMax 算法,它是较简单的 RRI 算法之一。用户从动态编程算法的数学规范开始,提供多面体程序变换的选择,如时间表、内存映射和多级平铺。AlphaZ 可自动生成高度优化的代码。在最底层,我们实现了一个小型手工优化的寄存器平铺 "矩阵 max-plus "内核,并将其与工具生成的优化代码集成在一起。我们的最终优化程序版本比基础程序快约 400 倍,约为 312 GFLOPS,超过我们平台 Roofline Machine Peak (RMP) 性能的一半。在单核上,我们达到了 RMP 的 80%。算法中的主内核复杂度为 $Theta (N^{3} , M^{3})$,在单核上达到 58 GFLOPS,在多核上达到 344 GFLOPS(分别为 RMP 的 90% 和 58%)。
{"title":"Taking RNA-RNA Interaction to Machine Peak","authors":"Chiranjeb Mondal;Sanjay Rajopadhye","doi":"10.1109/TPDS.2024.3380443","DOIUrl":"10.1109/TPDS.2024.3380443","url":null,"abstract":"RNA-RNA interactions (RRIs) are essential in many biological processes, including gene transcription, translation, and localization. They play a critical role in diseases such as cancer and Alzheimer’s. Algorithms to model RRI typically use dynamic programming and have the complexity \u0000<inline-formula><tex-math>$Theta (N^{3} , M^{3})$</tex-math></inline-formula>\u0000 in time and \u0000<inline-formula><tex-math>$Theta (N^{2} , M^{2})$</tex-math></inline-formula>\u0000 in space where \u0000<inline-formula><tex-math>$N$</tex-math></inline-formula>\u0000 and \u0000<inline-formula><tex-math>$M$</tex-math></inline-formula>\u0000 are the lengths of the two RNA sequences. This makes it both essential and challenging to parallelize them. Previous efforts to do so have been hand-optimized, which is prone to human error and costly to develop and maintain. This paper presents a multi-core CPU parallelization of BPMax, one of the simpler RRI algorithms, generated by a user-guided polyhedral code generation tool, \u0000<small><monospace><b>AlphaZ</b></monospace></small>\u0000. The user starts with a mathematical specification of the dynamic programming algorithm and provides the choice of polyhedral program transformations such as schedules, memory-maps, and multi-level tiling. \u0000<small><monospace><b>AlphaZ</b></monospace></small>\u0000 automatically generates highly optimized code. At the lowest level, we implemented a small hand-optimized register-tiled “matrix max-plus” kernel and integrated it with our tool-generated optimized code. Our final optimized program version is about \u0000<inline-formula><tex-math>$400times$</tex-math></inline-formula>\u0000 faster than the base program, translating to around 312 GFLOPS, more than half of our platform's \u0000<i>Roofline Machine Peak</i>\u0000 (\u0000<small><monospace><b>RMP</b></monospace></small>\u0000) performance. On a single core, we attain 80% of \u0000<small><monospace><b>RMP</b></monospace></small>\u0000. The main kernel in the algorithm, whose complexity is \u0000<inline-formula><tex-math>$Theta (N^{3} , M^{3})$</tex-math></inline-formula>\u0000, attains 58 GFLOPS on a single-core and 344 GFLOPS on multi-core (90% and 58% of \u0000<small><monospace><b>RMP</b></monospace></small>\u0000, respectively).","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140203646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Taking Advantage of the Mistakes: Rethinking Clustered Federated Learning for IoT Anomaly Detection 利用错误:反思用于物联网异常检测的集群联合学习
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-21 DOI: 10.1109/TPDS.2024.3379905
Jiamin Fan;Kui Wu;Guoming Tang;Yang Zhou;Shengqiang Huang
Clustered federated learning (CFL) is a promising solution to address the non-IID problem in the spatial domain for federated learning (FL). However, existing CFL solutions overlook the non-IID issue in the temporal domain and lack consideration of time efficiency. In this work, we propose a novel approach, called ClusterFLADS, which takes advantage of the false predictions of the inappropriate global models, together with knowledge of temperature scaling and catastrophic forgetting to reveal distributional similarities between the training data (of different clusters) and the test data. Additionally, we design an efficient feature extraction scheme by exploiting the role of each layer in a neural network's learning process. By strategically selecting model parameters and using PCA for dimensionality reduction, ClusterFLADS effectively improves clustering speed. We evaluate ClusterFLADS using real-world IoT trace data in various scenarios. Our results show that ClusterFLADS accurately and efficiently clusters clients, achieving a 100% true positive rate and low false positives across various data distributions in both the spatial and temporal domains.
聚类联合学习(CFL)是解决联合学习(FL)空间领域非 IID 问题的一种有前途的解决方案。然而,现有的联合学习解决方案忽视了时间域的非 IID 问题,也缺乏对时间效率的考虑。在这项工作中,我们提出了一种名为 ClusterFLADS 的新方法,它利用不恰当全局模型的错误预测,以及温度缩放和灾难性遗忘的知识,来揭示(不同群集的)训练数据和测试数据之间的分布相似性。此外,我们还利用神经网络学习过程中每一层的作用,设计了一种高效的特征提取方案。通过战略性地选择模型参数和使用 PCA 进行降维,ClusterFLADS 有效地提高了聚类速度。我们使用真实世界中各种场景的物联网跟踪数据对 ClusterFLADS 进行了评估。结果表明,ClusterFLADS 能准确、高效地聚类客户端,在空间和时间域的各种数据分布中实现 100% 的真阳性率和较低的误报率。
{"title":"Taking Advantage of the Mistakes: Rethinking Clustered Federated Learning for IoT Anomaly Detection","authors":"Jiamin Fan;Kui Wu;Guoming Tang;Yang Zhou;Shengqiang Huang","doi":"10.1109/TPDS.2024.3379905","DOIUrl":"10.1109/TPDS.2024.3379905","url":null,"abstract":"Clustered federated learning (CFL) is a promising solution to address the non-IID problem in the spatial domain for federated learning (FL). However, existing CFL solutions overlook the non-IID issue in the temporal domain and lack consideration of time efficiency. In this work, we propose a novel approach, called \u0000<italic>ClusterFLADS</i>\u0000, which takes advantage of the false predictions of the inappropriate global models, together with knowledge of temperature scaling and catastrophic forgetting to reveal distributional similarities between the training data (of different clusters) and the test data. Additionally, we design an efficient feature extraction scheme by exploiting the role of each layer in a neural network's learning process. By strategically selecting model parameters and using PCA for dimensionality reduction, \u0000<italic>ClusterFLADS</i>\u0000 effectively improves clustering speed. We evaluate \u0000<italic>ClusterFLADS</i>\u0000 using real-world IoT trace data in various scenarios. Our results show that \u0000<italic>ClusterFLADS</i>\u0000 accurately and efficiently clusters clients, achieving a 100% true positive rate and low false positives across various data distributions in both the spatial and temporal domains.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140203675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HI-Kyber: A Novel High-Performance Implementation Scheme of Kyber Based on GPU HI-Kyber:基于 GPU 的 Kyber 新型高性能实施方案
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-20 DOI: 10.1109/TPDS.2024.3379734
Xinyi Ji;Jiankuo Dong;Tonggui Deng;Pinchang Zhang;Jiafeng Hua;Fu Xiao
CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete algorithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum cryptography. In this paper, we present a High-performance Implementation of Kyber (named HI-Kyber) on the NVIDIA GPUs, which can increase the key-exchange performance of Kyber to the million-level. Firstly, we propose a lattice-based PQC implementation architecture based on kernel fusion, which can avoid redundant global-memory access operations. Secondly, We optimize and implement the core operations of CRYSTALS-Kyber, including Number Theoretic Transform (NTT), inverse NTT (INTT), pointwise multiplication, etc. Especially for the calculation bottleneck NTT operation, three novel methods are proposed to explore extreme performance: the sliced layer merging (SLM), the sliced depth-first search (SDFS-NTT) and the entire depth-first search (EDFS-NTT), which achieve a speedup of 7.5%, 28.5%, and 41.6% compared to the native implementation. Thirdly, we conduct comprehensive performance experiments with different parallel dimensions based on the above optimization. Finally, our key exchange performance reaches 1,664 kops/s. Specifically, based on the same platform, our HI-Kyber is 3.52× that of the GPU implementation based on the same instruction set and 1.78× that of the state-of-the-art one based on AI-accelerated tensor core.
CRYSTALS-Kyber是美国国家标准与技术研究院(NIST)在第三轮评选中唯一入选的公钥加密(PKE)算法,被认为是最有前途的后量子加密(PQC)方案之一。基于网格的密码学利用网格上复杂的离散算法问题来构建安全的加密和解密系统,以抵御来自量子计算的攻击。性能是影响后量子密码学推广的重要瓶颈。本文提出了在英伟达™(NVIDIA®)GPU上的Kyber高性能实现(名为HI-Kyber),可以将Kyber的密钥交换性能提高到百万级别。首先,我们提出了一种基于网格的 PQC 实现架构,该架构基于内核融合,可以避免多余的全局内存访问操作。其次,我们优化并实现了 CRYSTALS-Kyber 的核心操作,包括数论变换(NTT)、逆数论变换(INTT)、点乘等。特别是针对计算瓶颈NTT操作,我们提出了三种新方法来探索极限性能:切片层合并(SLM)、切片深度优先搜索(SDFS-NTT)和整体深度优先搜索(EDFS-NTT),与原生实现相比,它们的速度分别提高了7.5%、28.5%和41.6%。第三,我们在上述优化的基础上进行了不同并行维度的综合性能实验。最后,我们的密钥交换性能达到了 1,664 kops/s。具体来说,在同一平台上,我们的 HI-Kyber 是基于相同指令集的 GPU 实现的 3.52 倍,是基于人工智能加速张量核的最先进实现的 1.78 倍。
{"title":"HI-Kyber: A Novel High-Performance Implementation Scheme of Kyber Based on GPU","authors":"Xinyi Ji;Jiankuo Dong;Tonggui Deng;Pinchang Zhang;Jiafeng Hua;Fu Xiao","doi":"10.1109/TPDS.2024.3379734","DOIUrl":"10.1109/TPDS.2024.3379734","url":null,"abstract":"CRYSTALS-Kyber, as the only public key encryption (PKE) algorithm selected by the National Institute of Standards and Technology (NIST) in the third round, is considered one of the most promising post-quantum cryptography (PQC) schemes. Lattice-based cryptography uses complex discrete algorithm problems on lattices to build secure encryption and decryption systems to resist attacks from quantum computing. Performance is an important bottleneck affecting the promotion of post quantum cryptography. In this paper, we present a High-performance Implementation of Kyber (named HI-Kyber) on the NVIDIA GPUs, which can increase the key-exchange performance of Kyber to the million-level. Firstly, we propose a lattice-based PQC implementation architecture based on kernel fusion, which can avoid redundant global-memory access operations. Secondly, We optimize and implement the core operations of CRYSTALS-Kyber, including Number Theoretic Transform (NTT), inverse NTT (INTT), pointwise multiplication, etc. Especially for the calculation bottleneck NTT operation, three novel methods are proposed to explore extreme performance: the sliced layer merging (SLM), the sliced depth-first search (SDFS-NTT) and the entire depth-first search (EDFS-NTT), which achieve a speedup of 7.5%, 28.5%, and 41.6% compared to the native implementation. Thirdly, we conduct comprehensive performance experiments with different parallel dimensions based on the above optimization. Finally, our key exchange performance reaches 1,664 kops/s. Specifically, based on the same platform, our HI-Kyber is 3.52× that of the GPU implementation based on the same instruction set and 1.78× that of the state-of-the-art one based on AI-accelerated tensor core.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140203677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fed-RAC: Resource-Aware Clustering for Tackling Heterogeneity of Participants in Federated Learning 为应对联合学习中参与者的异质性而进行资源感知聚类
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-20 DOI: 10.1109/TPDS.2024.3379933
Rahul Mishra;Hari Prabhat Gupta;Garvit Banga;Sajal K. Das
Federated Learning is a training framework that enables multiple participants to collaboratively train a shared model while preserving data privacy. The heterogeneity of devices and networking resources of the participants delay the training and aggregation. The paper introduces a novel approach to federated learning by incorporating resource-aware clustering. This method addresses the challenges posed by the diverse devices and networking resources among participants. Unlike static clustering approaches, this paper proposes a dynamic method to determine the optimal number of clusters using Dunn Indices. It enables adaptability to the varying heterogeneity levels among participants, ensuring a responsive and customized approach to clustering. Next, the paper goes beyond empirical observations by providing a mathematical derivation of the communication rounds for convergence within each cluster. Further, the participant assignment mechanism adds a layer of sophistication and ensures that devices and networking resources are allocated optimally. Afterwards, we incorporate a leader-follower technique, particularly through knowledge distillation, which improves the performance of lightweight models within clusters. Finally, experiments are conducted to validate the approach and to compare it with state-of-the-art. The results demonstrated an accuracy improvement of over 3% compared to its closest competitor and a reduction in communication rounds of around 10%.
联合学习(Federated Learning)是一种训练框架,它能让多个参与者在保护数据隐私的同时协作训练一个共享模型。参与者的设备和网络资源的异质性会延迟训练和聚合。本文介绍了一种结合资源感知聚类的联合学习新方法。这种方法解决了参与者之间不同设备和网络资源带来的挑战。与静态聚类方法不同,本文提出了一种动态方法,利用邓恩指数确定最佳聚类数量。该方法可适应参与者之间不同的异质性水平,确保采用反应灵敏、量身定制的聚类方法。接下来,本文超越了经验观察,对每个聚类内收敛的通信轮数进行了数学推导。此外,参与者分配机制还增加了一层复杂性,确保设备和网络资源得到最优分配。之后,我们采用了领导者-追随者技术,特别是通过知识提炼,提高了集群内轻量级模型的性能。最后,我们进行了实验来验证这种方法,并将其与最先进的方法进行比较。结果表明,与最接近的竞争对手相比,该方法的准确率提高了 3%,通信回合数减少了约 10%。
{"title":"Fed-RAC: Resource-Aware Clustering for Tackling Heterogeneity of Participants in Federated Learning","authors":"Rahul Mishra;Hari Prabhat Gupta;Garvit Banga;Sajal K. Das","doi":"10.1109/TPDS.2024.3379933","DOIUrl":"10.1109/TPDS.2024.3379933","url":null,"abstract":"Federated Learning is a training framework that enables multiple participants to collaboratively train a shared model while preserving data privacy. The heterogeneity of devices and networking resources of the participants delay the training and aggregation. The paper introduces a novel approach to federated learning by incorporating resource-aware clustering. This method addresses the challenges posed by the diverse devices and networking resources among participants. Unlike static clustering approaches, this paper proposes a dynamic method to determine the optimal number of clusters using Dunn Indices. It enables adaptability to the varying heterogeneity levels among participants, ensuring a responsive and customized approach to clustering. Next, the paper goes beyond empirical observations by providing a mathematical derivation of the communication rounds for convergence within each cluster. Further, the participant assignment mechanism adds a layer of sophistication and ensures that devices and networking resources are allocated optimally. Afterwards, we incorporate a leader-follower technique, particularly through knowledge distillation, which improves the performance of lightweight models within clusters. Finally, experiments are conducted to validate the approach and to compare it with state-of-the-art. The results demonstrated an accuracy improvement of over 3% compared to its closest competitor and a reduction in communication rounds of around 10%.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140203645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAs CREPE:针对 CGRA 的并行反向模式调度和布局
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-16 DOI: 10.1109/TPDS.2024.3402098
Chilankamol Sunny;Satyajit Das;Kevin J. M. Martin;Philippe Coussy
Coarse-Grained Reconfigurable Array (CGRA) architectures are popular as high-performance and energy-efficient computing devices. Compute-intensive loop constructs of complex applications are mapped onto CGRAs by modulo-scheduling the innermost loop dataflow graph (DFG). In the state-of-the-art approaches, mapping quality is typically determined by initiation interval (II), while schedule length for one iteration is neglected. However, for nested loops, schedule length becomes important. In this article, we propose CREPE, a Concurrent Reverse-modulo-scheduling and Placement technique for CGRAs that minimizes both II and schedule length. CREPE performs simultaneous modulo-scheduling and placement coupled with dynamic graph transformations, generating good-quality mappings with high success rates. Furthermore, we introduce a compilation flow that maps nested loops onto the CGRA and modulo-schedules the innermost loop using CREPE. Experiments show that the proposed solution outperforms the conventional approaches in mapping success rate and total execution time with no impact on the compilation time. CREPE maps all kernels considered while state-of-the-art techniques Crimson and Epimap failed to find a mapping or mapped at very high IIs. On a 2×4 CGRA, CREPE reports a 100% success rate and a speed-up up to 5.9× and 1.4× over Crimson with 78.5% and Epimap with 46.4% success rates respectively.
粗粒度可重构阵列(CGRA)架构作为高性能、高能效的计算设备广受欢迎。通过对最内层循环数据流图(DFG)进行模数调度,可将复杂应用的计算密集型循环结构映射到 CGRA 上。在最先进的方法中,映射质量通常由启动间隔 (II) 决定,而忽略了一次迭代的调度长度。然而,对于嵌套循环,计划长度变得非常重要。在本文中,我们提出了一种用于 CGRA 的并发反向模态调度和布局技术 CREPE,它能同时最小化 II 和调度长度。CREPE 结合动态图变换同时执行模化调度和布局,可生成高质量、高成功率的映射。此外,我们还引入了一种编译流程,将嵌套循环映射到 CGRA 上,并使用 CREPE 对最内层的循环进行模块化调度。实验表明,所提出的解决方案在映射成功率和总执行时间上都优于传统方法,而且对编译时间没有影响。CREPE 映射了所考虑的所有内核,而最先进的技术 Crimson 和 Epimap 却未能找到映射,或者映射的 IIs 非常高。在 2×4 CGRA 上,CREPE 的成功率为 100%,速度分别比 Crimson(78.5%)和 Epimap(46.4%)快 5.9 倍和 1.4 倍。
{"title":"CREPE: Concurrent Reverse-Modulo-Scheduling and Placement for CGRAs","authors":"Chilankamol Sunny;Satyajit Das;Kevin J. M. Martin;Philippe Coussy","doi":"10.1109/TPDS.2024.3402098","DOIUrl":"10.1109/TPDS.2024.3402098","url":null,"abstract":"Coarse-Grained Reconfigurable Array (CGRA) architectures are popular as high-performance and energy-efficient computing devices. Compute-intensive loop constructs of complex applications are mapped onto CGRAs by modulo-scheduling the innermost loop dataflow graph (DFG). In the state-of-the-art approaches, mapping quality is typically determined by initiation interval (\u0000<italic>II</i>\u0000), while \u0000<italic>schedule length</i>\u0000 for one iteration is neglected. However, for nested loops, \u0000<italic>schedule length</i>\u0000 becomes important. In this article, we propose CREPE, a \u0000<bold>C</b>\u0000oncurrent \u0000<bold>Re</b>\u0000verse-modulo-scheduling and \u0000<bold>P</b>\u0000lac\u0000<bold>e</b>\u0000ment technique for CGRAs that minimizes both \u0000<italic>II</i>\u0000 and \u0000<italic>schedule length</i>\u0000. CREPE performs simultaneous modulo-scheduling and placement coupled with dynamic graph transformations, generating good-quality mappings with high success rates. Furthermore, we introduce a compilation flow that maps nested loops onto the CGRA and modulo-schedules the innermost loop using CREPE. Experiments show that the proposed solution outperforms the conventional approaches in mapping success rate and total execution time with no impact on the compilation time. CREPE maps all kernels considered while state-of-the-art techniques Crimson and Epimap failed to find a mapping or mapped at very high \u0000<italic>II</i>\u0000s. On a 2×4 CGRA, CREPE reports a 100% success rate and a speed-up up to 5.9× and 1.4× over Crimson with 78.5% and Epimap with 46.4% success rates respectively.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Neural Control for a Network of Parabolic PDEs With Event-Triggered Mechanism 具有事件触发机制的抛物线 PDE 网络的自适应神经控制
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-15 DOI: 10.1109/TPDS.2024.3401164
Sai Zhang;Li Tang;Yan-Jun Liu
This paper investigates the finite-time consensus problem for nonlinear parabolic networks by designing a new tracking controller. For undirected topology, the newly designed controller allows to optimize the consensus time by adjusting the parameter $beta (0< beta < 1 )$. First, the neural network approximation property is utilized to counteract the uncertain nonlinear dynamics of agents, and the event-triggered mechanism is designed to save energy and reduce the communication burden. Second, a tracking control protocol is proposed based on event-triggered mechanism, which drives the multi-agent system to reach leader-follower consensus in finite time. Then, by considering appropriate Lyapunov generalization functions and using some important inequalities, the sufficient condition for achieving finite-time consensus in the multi-agent system is obtained. Finally, the effectiveness of the presented method is verified by simulation.
本文通过设计一种新的跟踪控制器,研究了非线性抛物线网络的有限时间共识问题。对于无向拓扑,新设计的控制器可以通过调整参数 $beta (0< beta < 1 )$ 来优化共识时间。首先,利用神经网络的近似特性来抵消代理的不确定非线性动力学,并设计了事件触发机制来节省能量和减少通信负担。其次,在事件触发机制的基础上提出了一种跟踪控制协议,驱动多代理系统在有限时间内达成领导者-跟随者共识。然后,通过考虑适当的 Lyapunov 泛化函数和使用一些重要的不等式,得到了在多代理系统中实现有限时间共识的充分条件。最后,通过仿真验证了所提出方法的有效性。
{"title":"Adaptive Neural Control for a Network of Parabolic PDEs With Event-Triggered Mechanism","authors":"Sai Zhang;Li Tang;Yan-Jun Liu","doi":"10.1109/TPDS.2024.3401164","DOIUrl":"10.1109/TPDS.2024.3401164","url":null,"abstract":"This paper investigates the finite-time consensus problem for nonlinear parabolic networks by designing a new tracking controller. For undirected topology, the newly designed controller allows to optimize the consensus time by adjusting the parameter \u0000<inline-formula><tex-math>$beta (0&lt; beta &lt; 1 )$</tex-math></inline-formula>\u0000. First, the neural network approximation property is utilized to counteract the uncertain nonlinear dynamics of agents, and the event-triggered mechanism is designed to save energy and reduce the communication burden. Second, a tracking control protocol is proposed based on event-triggered mechanism, which drives the multi-agent system to reach leader-follower consensus in finite time. Then, by considering appropriate Lyapunov generalization functions and using some important inequalities, the sufficient condition for achieving finite-time consensus in the multi-agent system is obtained. Finally, the effectiveness of the presented method is verified by simulation.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141062275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DMA-Assisted I/O for Persistent Memory 持久性内存的 DMA 辅助 I/O
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-14 DOI: 10.1109/TPDS.2024.3373003
Dingding Li;Weijie Zhang;Mianxiong Dong;Kaoru Ota
Modern local persistent memory (PM) file systems often rely on CPU-based memory copying for data transfer between DRAM and PM, resulting in significant CPU resource consumption. While some nascent systems explore DMA (direct memory access) as an alternative for improved efficiency, the intricacies and trade-offs remain obscure. This paper investigates the feasibility of DMA for PM I/O and argues that it is not a straightforward replacement for CPU-based methods. Two key limitations hinder the direct adoption: poor performance for small data and limited bandwidth. To relieve these issues, we propose PM-DMA, a novel I/O mechanism that leverages the strengths of both CPU and DMA. It incorporates three key components: (1) L-Switch, seamlessly switches between CPU and DMA modes based on workload characteristics, maximizing performance; (2) D-Pool, reduces DMA setup overhead, improving responsiveness; (3) P-Mode, allows servicing requests through multiple channels, even hybrid CPU-DMA ones, for enhanced throughput. We implemented PM-DMA on two well-known PM file systems, NOVA and WineFS, utilizing Intel I/OAT technology. Our experimental results demonstrate substantial CPU consumption reductions across diverse workloads. Notably, under heavy load, PM-DMA delivers up to a $10.4times$ performance improvement.
现代本地持久内存(PM)文件系统通常依赖于基于 CPU 的内存复制,在 DRAM 和 PM 之间进行数据传输,从而导致大量 CPU 资源消耗。虽然一些新生系统探索用 DMA(直接内存访问)作为提高效率的替代方法,但其中的复杂性和利弊权衡仍然模糊不清。本文研究了 DMA 用于 PM I/O 的可行性,并认为它不能直接替代基于 CPU 的方法。有两个关键限制阻碍了直接采用:小数据性能差和带宽有限。为了解决这些问题,我们提出了 PM-DMA,一种利用 CPU 和 DMA 优点的新型 I/O 机制。它包含三个关键部分:(1) L-Switch,根据工作负载特征在 CPU 和 DMA 模式之间无缝切换,从而最大限度地提高性能;(2) D-Pool,减少 DMA 设置开销,提高响应速度;(3) P-Mode,允许通过多个通道(甚至是 CPU-DMA 混合通道)为请求提供服务,从而提高吞吐量。我们利用英特尔 I/OAT 技术,在两个著名的 PM 文件系统 NOVA 和 WineFS 上实现了 PM-DMA。我们的实验结果表明,在不同的工作负载下,CPU 消耗都有大幅降低。值得注意的是,在重负载情况下,PM-DMA 可带来高达 10.4 美元/次的性能提升。
{"title":"DMA-Assisted I/O for Persistent Memory","authors":"Dingding Li;Weijie Zhang;Mianxiong Dong;Kaoru Ota","doi":"10.1109/TPDS.2024.3373003","DOIUrl":"10.1109/TPDS.2024.3373003","url":null,"abstract":"Modern local persistent memory (PM) file systems often rely on CPU-based memory copying for data transfer between DRAM and PM, resulting in significant CPU resource consumption. While some nascent systems explore DMA (direct memory access) as an alternative for improved efficiency, the intricacies and trade-offs remain obscure. This paper investigates the feasibility of DMA for PM I/O and argues that it is not a straightforward replacement for CPU-based methods. Two key limitations hinder the direct adoption: poor performance for small data and limited bandwidth. To relieve these issues, we propose PM-DMA, a novel I/O mechanism that leverages the strengths of both CPU and DMA. It incorporates three key components: (1) L-Switch, seamlessly switches between CPU and DMA modes based on workload characteristics, maximizing performance; (2) D-Pool, reduces DMA setup overhead, improving responsiveness; (3) P-Mode, allows servicing requests through multiple channels, even hybrid CPU-DMA ones, for enhanced throughput. We implemented PM-DMA on two well-known PM file systems, NOVA and WineFS, utilizing Intel I/OAT technology. Our experimental results demonstrate substantial CPU consumption reductions across diverse workloads. Notably, under heavy load, PM-DMA delivers up to a \u0000<inline-formula><tex-math>$10.4times$</tex-math></inline-formula>\u0000 performance improvement.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PROV-IO$^+$+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems PROV-IO$^+$:高性能计算系统上科学数据的跨平台证明框架
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-14 DOI: 10.1109/TPDS.2024.3374555
Runzhou Han;Mai Zheng;Suren Byna;Houjun Tang;Bin Dong;Dong Dai;Yong Chen;Dongkyun Kim;Joseph Hassoun;David Thorsley
Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance solutions cannot address the challenges due to their incompatible provenance models and/or system implementations. In this paper, we analyze four representative scientific workflows in collaboration with the domain scientists to identify concrete provenance needs. Based on the first-hand analysis, we propose a provenance framework called PROV-IO$^+$, which includes an I/O-centric provenance model for describing scientific data and the associated I/O operations and environments precisely. Moreover, we build a prototype of PROV-IO$^+$ to enable end-to-end provenance support on real HPC systems with little manual effort. The PROV-IO$^+$ framework can support both containerized and non-containerized workflows on different HPC platforms with flexibility in selecting various classes of provenance. Our experiments with realistic workflows show that PROV-IO$^+$ can address the provenance needs of the domain scientists effectively with reasonable performance (e.g., less than 3.5% tracking overhead for most experiments). Moreover, PROV-IO$^+$ outperforms a state-of-the-art system (i.e., ProvLake) in our experiments.
数据出处(或称数据脉络)描述了数据的生命周期。在高性能计算系统上的科学工作流中,科学家经常会寻求各种出处(例如,数据产品的来源、数据集的使用模式)。遗憾的是,由于不兼容的出处模型和/或系统实现,现有的出处解决方案无法应对这些挑战。在本文中,我们与领域科学家合作分析了四个具有代表性的科学工作流,以确定具体的出处需求。在第一手分析的基础上,我们提出了一个名为 PROV-IO$^+$ 的出处框架,其中包括一个以 I/O 为中心的出处模型,用于精确描述科学数据以及相关的 I/O 操作和环境。此外,我们还构建了 PROV-IO$^+$ 的原型,以便在实际的高性能计算系统上实现端到端的出处支持,而无需太多的人工操作。PROV-IO$^+$框架可以在不同的高性能计算平台上支持容器化和非容器化工作流,并能灵活选择不同类别的出处。我们对现实工作流的实验表明,PROV-IO$^+$ 能以合理的性能有效满足领域科学家的出处需求(例如,大多数实验的跟踪开销低于 3.5%)。此外,在我们的实验中,PROV-IO$^+$ 的性能优于最先进的系统(即 ProvLake)。
{"title":"PROV-IO$^+$+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems","authors":"Runzhou Han;Mai Zheng;Suren Byna;Houjun Tang;Bin Dong;Dong Dai;Yong Chen;Dongkyun Kim;Joseph Hassoun;David Thorsley","doi":"10.1109/TPDS.2024.3374555","DOIUrl":"10.1109/TPDS.2024.3374555","url":null,"abstract":"Data provenance, or data lineage, describes the life cycle of data. In scientific workflows on HPC systems, scientists often seek diverse provenance (e.g., origins of data products, usage patterns of datasets). Unfortunately, existing provenance solutions cannot address the challenges due to their incompatible provenance models and/or system implementations. In this paper, we analyze four representative scientific workflows in collaboration with the domain scientists to identify concrete provenance needs. Based on the first-hand analysis, we propose a provenance framework called PROV-IO\u0000<inline-formula><tex-math>$^+$</tex-math></inline-formula>\u0000, which includes an I/O-centric provenance model for describing scientific data and the associated I/O operations and environments precisely. Moreover, we build a prototype of PROV-IO\u0000<inline-formula><tex-math>$^+$</tex-math></inline-formula>\u0000 to enable end-to-end provenance support on real HPC systems with little manual effort. The PROV-IO\u0000<inline-formula><tex-math>$^+$</tex-math></inline-formula>\u0000 framework can support both containerized and non-containerized workflows on different HPC platforms with flexibility in selecting various classes of provenance. Our experiments with realistic workflows show that PROV-IO\u0000<inline-formula><tex-math>$^+$</tex-math></inline-formula>\u0000 can address the provenance needs of the domain scientists effectively with reasonable performance (e.g., less than 3.5% tracking overhead for most experiments). Moreover, PROV-IO\u0000<inline-formula><tex-math>$^+$</tex-math></inline-formula>\u0000 outperforms a state-of-the-art system (i.e., ProvLake) in our experiments.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140153003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Availability-Aware Revenue-Effective Application Deployment in Multi-Access Edge Computing 在多接入边缘计算中部署具有可用性意识的高收益应用
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-13 DOI: 10.1109/TPDS.2024.3399840
Lu Zhao;Fu Xiao;Bo Li;Jian Zhou;Xiaolong Xu;Yun Yang
Multi-access edge computing (MEC) has emerged as a promising computing paradigm to push computing resources and services to the network edge. It allows applications/services to be deployed on edge servers for provisioning low-latency services to nearby users. However, in the MEC environment, edge servers may suffer from failures while the app vendor has to guarantee continuously available services to its users, thereby securing its revenue for application instances deployed. In this paper, we focus on available service provisioning when cost-effectively deploying application instances on edge servers. We first formulate a novel Availability-aware Revenue-effective Application Deployment (ARAD) problem in the MEC environment with the aim to maximize the overall revenue by considering both service availability benefit and deployment cost. We prove that the ARAD problem is $mathcal {NP}$-hard. Then, we propose an approximation algorithm named ARAD-A to find the ARAD solution efficiently with a constant approximation ratio of $frac{1}{2}$. We extensively evaluate the performance of ARAD-A against five representative approaches. Experimental results demonstrate that our ARAD-A can achieve the best performance in securing the app vendor's overall revenue.
多接入边缘计算(MEC)已成为将计算资源和服务推向网络边缘的一种前景广阔的计算模式。它允许在边缘服务器上部署应用程序/服务,为附近的用户提供低延迟服务。然而,在 MEC 环境中,边缘服务器可能会出现故障,而应用程序供应商必须保证向用户提供持续可用的服务,从而确保所部署应用程序实例的收入。在本文中,我们将重点关注在边缘服务器上经济高效地部署应用实例时的可用服务供应。我们首先在 MEC 环境中提出了一个新颖的 "可用性感知的高收益应用部署(ARAD)"问题,旨在通过同时考虑服务可用性收益和部署成本来实现整体收益最大化。我们证明了 ARAD 问题的难度为 $mathcal {NP}$。然后,我们提出了一种名为 ARAD-A 的近似算法,可以高效地找到 ARAD 解,其近似率恒定为 $frac{1}{2}$。我们将 ARAD-A 与五种代表性方法进行了广泛的性能评估。实验结果表明,我们的 ARAD-A 在确保应用程序供应商的整体收入方面可以达到最佳性能。
{"title":"Availability-Aware Revenue-Effective Application Deployment in Multi-Access Edge Computing","authors":"Lu Zhao;Fu Xiao;Bo Li;Jian Zhou;Xiaolong Xu;Yun Yang","doi":"10.1109/TPDS.2024.3399840","DOIUrl":"10.1109/TPDS.2024.3399840","url":null,"abstract":"Multi-access edge computing (MEC) has emerged as a promising computing paradigm to push computing resources and services to the network edge. It allows applications/services to be deployed on edge servers for provisioning low-latency services to nearby users. However, in the MEC environment, edge servers may suffer from failures while the app vendor has to guarantee continuously available services to its users, thereby securing its revenue for application instances deployed. In this paper, we focus on available service provisioning when cost-effectively deploying application instances on edge servers. We first formulate a novel \u0000<u>A</u>\u0000vailability-aware \u0000<u>R</u>\u0000evenue-effective \u0000<u>A</u>\u0000pplication \u0000<u>D</u>\u0000eployment (ARAD) problem in the MEC environment with the aim to maximize the overall revenue by considering both service availability benefit and deployment cost. We prove that the ARAD problem is \u0000<inline-formula><tex-math>$mathcal {NP}$</tex-math></inline-formula>\u0000-hard. Then, we propose an approximation algorithm named \u0000<i>ARAD-A</i>\u0000 to find the ARAD solution efficiently with a constant approximation ratio of \u0000<inline-formula><tex-math>$frac{1}{2}$</tex-math></inline-formula>\u0000. We extensively evaluate the performance of \u0000<i>ARAD-A</i>\u0000 against five representative approaches. Experimental results demonstrate that our \u0000<i>ARAD-A</i>\u0000 can achieve the best performance in securing the app vendor's overall revenue.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140934691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spiking Neural P Systems With Microglia 小胶质细胞的尖峰神经 P 系统
IF 5.3 2区 计算机科学 Q1 Computer Science Pub Date : 2024-03-13 DOI: 10.1109/TPDS.2024.3399755
Yuzhen Zhao;Xiyu Liu
Spiking neural P systems (SNP systems), one of the parallel and distributed computing models with biological interpretability, have been a hot research topic in bio-inspired computational models in recent years. To improve the stability of the models, this study introduces microglia in the biological nervous system into SNP systems and proposes SNP systems with microglia (MSNP systems). In MSNP systems, besides neurons, another cell type named microglia is introduced. Microglia can help neurons in the range of action maintain homeostasis and prevent excitotoxicity, i.e., excessive excitability. Specifically, microglia use a new microglial maintenance rule to lower the number of spikes in neurons within their range of action when it is too high. The computational capability and efficiency of MSNP systems are also proved. This study makes SNP systems more stable and avoids data overflow or data explosion problems to some degree.
尖峰神经P系统(SNP系统)是具有生物可解释性的并行和分布式计算模型之一,是近年来生物启发计算模型的热门研究课题。为了提高模型的稳定性,本研究将生物神经系统中的小胶质细胞引入 SNP 系统,提出了带小胶质细胞的 SNP 系统(MSNP 系统)。在 MSNP 系统中,除了神经元,还引入了另一种名为小胶质细胞的细胞类型。小胶质细胞可以帮助作用范围内的神经元保持平衡,防止兴奋毒性,即过度兴奋。具体来说,当作用范围内神经元的尖峰数量过高时,小胶质细胞会使用一种新的小胶质细胞维持规则来降低尖峰数量。MSNP 系统的计算能力和效率也得到了证明。这项研究使 SNP 系统更加稳定,并在一定程度上避免了数据溢出或数据爆炸问题。
{"title":"Spiking Neural P Systems With Microglia","authors":"Yuzhen Zhao;Xiyu Liu","doi":"10.1109/TPDS.2024.3399755","DOIUrl":"10.1109/TPDS.2024.3399755","url":null,"abstract":"Spiking neural P systems (SNP systems), one of the parallel and distributed computing models with biological interpretability, have been a hot research topic in bio-inspired computational models in recent years. To improve the stability of the models, this study introduces microglia in the biological nervous system into SNP systems and proposes SNP systems with microglia (MSNP systems). In MSNP systems, besides neurons, another cell type named microglia is introduced. Microglia can help neurons in the range of action maintain homeostasis and prevent excitotoxicity, i.e., excessive excitability. Specifically, microglia use a new microglial maintenance rule to lower the number of spikes in neurons within their range of action when it is too high. The computational capability and efficiency of MSNP systems are also proved. This study makes SNP systems more stable and avoids data overflow or data explosion problems to some degree.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":5.3,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140934761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Parallel and Distributed Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1