首页 > 最新文献

ACM Transactions on Parallel Computing最新文献

英文 中文
Deterministic Constant-Amortized-RMR Abortable Mutex for CC and DSM CC和DSM的确定性常摊销rmr可终止互斥
IF 1.6 Q2 Computer Science Pub Date : 2021-12-09 DOI: 10.1145/3490559
P. Jayanti, S. Jayanti
The abortable mutual exclusion problem, proposed by Scott and Scherer in response to the needs in real-time systems and databases, is a variant of mutual exclusion that allows processes to abort from their attempt to acquire the lock. Worst-case constant remote memory reference algorithms for mutual exclusion using hardware instructions such as Fetch&Add or Fetch&Store have long existed for both cache coherent (CC) and distributed shared memory multiprocessors, but no such algorithms are known for abortable mutual exclusion. Even relaxing the worst-case requirement to amortized, algorithms are only known for the CC model. In this article, we improve this state of the art by designing a deterministic algorithm that uses Fetch&Store to achieve amortized O(1) remote memory reference in both the CC and distributed shared memory models. Our algorithm supports Fast Abort (a process aborts within six steps of receiving the abort signal) and has the following additional desirable properties: it supports an arbitrary number of processes of arbitrary names, requires only O(1) space per process, and satisfies a novel fairness condition that we call Airline FCFS. Our algorithm is short with fewer than a dozen lines of code.
Scott和Scherer针对实时系统和数据库的需求提出的可中止互斥问题是互斥的一种变体,允许进程中止获取锁的尝试。使用硬件指令(如Fetch&Add或Fetch&Store)进行互斥的最坏情况恒定远程内存引用算法长期以来一直存在于缓存一致性(CC)和分布式共享内存多处理器中,但目前还没有已知的可中止互斥算法。即使将最坏情况的要求放宽到摊销,算法也只适用于CC模型。在本文中,我们通过设计一种确定性算法来改进这一技术现状,该算法使用Fetch&Store在CC和分布式共享内存模型中实现摊销的O(1)远程内存引用。我们的算法支持快速中止(一个进程在接收中止信号的六个步骤内中止),并具有以下额外的理想属性:它支持任意数量的任意名称的进程,每个进程只需要O(1)空间,并满足我们称之为Airline FCFS的新的公平条件。我们的算法很短,只有不到十几行代码。
{"title":"Deterministic Constant-Amortized-RMR Abortable Mutex for CC and DSM","authors":"P. Jayanti, S. Jayanti","doi":"10.1145/3490559","DOIUrl":"https://doi.org/10.1145/3490559","url":null,"abstract":"The abortable mutual exclusion problem, proposed by Scott and Scherer in response to the needs in real-time systems and databases, is a variant of mutual exclusion that allows processes to abort from their attempt to acquire the lock. Worst-case constant remote memory reference algorithms for mutual exclusion using hardware instructions such as Fetch&Add or Fetch&Store have long existed for both cache coherent (CC) and distributed shared memory multiprocessors, but no such algorithms are known for abortable mutual exclusion. Even relaxing the worst-case requirement to amortized, algorithms are only known for the CC model. In this article, we improve this state of the art by designing a deterministic algorithm that uses Fetch&Store to achieve amortized O(1) remote memory reference in both the CC and distributed shared memory models. Our algorithm supports Fast Abort (a process aborts within six steps of receiving the abort signal) and has the following additional desirable properties: it supports an arbitrary number of processes of arbitrary names, requires only O(1) space per process, and satisfies a novel fairness condition that we call Airline FCFS. Our algorithm is short with fewer than a dozen lines of code.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2021-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46508215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Adaptive Erasure Coded Fault Tolerant Linear System Solver 自适应擦除编码容错线性系统求解器
IF 1.6 Q2 Computer Science Pub Date : 2021-12-08 DOI: 10.1145/3490557
X. Kang, D. Gleich, A. Sameh, A. Grama
As parallel and distributed systems scale, fault tolerance is an increasingly important problem—particularly on systems with limited I/O capacity and bandwidth. Erasure coded computations address this problem by augmenting a given problem instance with redundant data and then solving the augmented problem in a fault oblivious manner in a faulty parallel environment. In the event of faults, a computationally inexpensive procedure is used to compute the true solution from a potentially fault-prone solution. These techniques are significantly more efficient than conventional solutions to the fault tolerance problem. In this article, we show how we can minimize, to optimality, the overhead associated with our problem augmentation techniques for linear system solvers. Specifically, we present a technique that adaptively augments the problem only when faults are detected. At any point in execution, we only solve a system whose size is identical to the original input system. This has several advantages in terms of maintaining the size and conditioning of the system, as well as in only adding the minimal amount of computation needed to tolerate observed faults. We present, in detail, the augmentation process, the parallel formulation, and evaluation of performance of our technique. Specifically, we show that the proposed adaptive fault tolerance mechanism has minimal overhead in terms of FLOP counts with respect to the original solver executing in a non-faulty environment, has good convergence properties, and yields excellent parallel performance. We also demonstrate that our approach significantly outperforms an optimized application-level checkpointing scheme that only checkpoints needed data structures.
随着并行和分布式系统的扩展,容错是一个越来越重要的问题——特别是在I/O容量和带宽有限的系统上。Erasure编码计算通过用冗余数据扩充给定的问题实例,然后在有缺陷的并行环境中以错误无关的方式解决扩充的问题来解决这个问题。在发生故障的情况下,使用计算成本不高的过程从可能容易出错的解决方案中计算出真正的解决方案。对于容错问题,这些技术比传统的解决方案要有效得多。在本文中,我们将展示如何将与线性系统求解器的问题扩展技术相关的开销最小化到最优状态。具体来说,我们提出了一种仅在检测到故障时自适应增强问题的技术。在执行的任何时候,我们只求解大小与原始输入系统相同的系统。这在维护系统的大小和调节方面有几个优点,并且只增加了容忍观察到的错误所需的最小计算量。我们详细介绍了增强过程、并行配方和对我们技术性能的评估。具体而言,我们证明了所提出的自适应容错机制在非故障环境中执行的原始求解器的FLOP计数方面具有最小的开销,具有良好的收敛特性,并产生出色的并行性能。我们还证明,我们的方法明显优于优化的应用程序级检查点方案,该方案只需要检查点的数据结构。
{"title":"Adaptive Erasure Coded Fault Tolerant Linear System Solver","authors":"X. Kang, D. Gleich, A. Sameh, A. Grama","doi":"10.1145/3490557","DOIUrl":"https://doi.org/10.1145/3490557","url":null,"abstract":"As parallel and distributed systems scale, fault tolerance is an increasingly important problem—particularly on systems with limited I/O capacity and bandwidth. Erasure coded computations address this problem by augmenting a given problem instance with redundant data and then solving the augmented problem in a fault oblivious manner in a faulty parallel environment. In the event of faults, a computationally inexpensive procedure is used to compute the true solution from a potentially fault-prone solution. These techniques are significantly more efficient than conventional solutions to the fault tolerance problem. In this article, we show how we can minimize, to optimality, the overhead associated with our problem augmentation techniques for linear system solvers. Specifically, we present a technique that adaptively augments the problem only when faults are detected. At any point in execution, we only solve a system whose size is identical to the original input system. This has several advantages in terms of maintaining the size and conditioning of the system, as well as in only adding the minimal amount of computation needed to tolerate observed faults. We present, in detail, the augmentation process, the parallel formulation, and evaluation of performance of our technique. Specifically, we show that the proposed adaptive fault tolerance mechanism has minimal overhead in terms of FLOP counts with respect to the original solver executing in a non-faulty environment, has good convergence properties, and yields excellent parallel performance. We also demonstrate that our approach significantly outperforms an optimized application-level checkpointing scheme that only checkpoints needed data structures.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2021-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45097716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel Peeling of Bipartite Networks for Hierarchical Dense Subgraph Discovery 面向层次密集子图发现的二部网络并行剥离
IF 1.6 Q2 Computer Science Pub Date : 2021-10-24 DOI: 10.1145/3583084
Kartik Lakhotia, R. Kannan, V. Prasanna
Wing and Tip decomposition are motif-based analytics for bipartite graphs that construct a hierarchy of butterfly (2,2-biclique) dense edge and vertex induced subgraphs, respectively. They have applications in several domains, including e-commerce, recommendation systems, document analysis, and others. Existing decomposition algorithms use a bottom-up approach that constructs the hierarchy in an increasing order of the subgraph density. They iteratively select the edges or vertices with minimum butterfly count peel, i.e., remove them along with their butterflies. The amount of butterflies in real-world bipartite graphs makes bottom-up peeling computationally demanding. Furthermore, the strict order of peeling entities results in a large number of sequentially dependent iterations. Consequently, parallel algorithms based on bottom up peeling incur heavy synchronization and poor scalability. In this article, we propose a novel Parallel Bipartite Network peelinG (PBNG) framework that adopts a two-phased peeling approach to relax the order of peeling, and in turn, dramatically reduce synchronization. The first phase divides the decomposition hierarchy into few partitions and requires little synchronization. The second phase concurrently processes all partitions to generate individual levels of the hierarchy and requires no global synchronization. The two-phased peeling further enables batching optimizations that dramatically improve the computational efficiency of PBNG. We empirically evaluate PBNG using several real-world bipartite graphs and demonstrate radical improvements over the existing approaches. On a shared-memory 36 core server, PBNG achieves up to 19.7× self-relative parallel speedup. Compared to the state-of-the-art parallel framework ParButterfly, PBNG reduces synchronization by up to 15,260× and execution time by up to 295×. Furthermore, it achieves up to 38.5× speedup over state-of-the-art algorithms specifically tuned for wing decomposition. Our source code is made available at https://github.com/kartiklakhotia/RECEIPT.
Wing和Tip分解是基于基序的二分图分析,分别构建了蝴蝶(2,2-二分)密集边和顶点诱导子图的层次。它们在多个领域都有应用,包括电子商务、推荐系统、文档分析等。现有的分解算法使用自下而上的方法,按照子图密度的递增顺序构建层次结构。他们迭代地选择蝴蝶数量剥离最小的边或顶点,即将它们与蝴蝶一起移除。现实世界中的二分图中蝴蝶的数量使得自下而上的剥离在计算上要求很高。此外,剥离实体的严格顺序会导致大量顺序相关的迭代。因此,基于自下而上剥离的并行算法同步性差,可扩展性差。在本文中,我们提出了一种新的并行二部分网络剥离G(PBNG)框架,该框架采用两阶段剥离方法来放松剥离顺序,从而显著减少同步。第一阶段将分解层次划分为几个分区,并且几乎不需要同步。第二阶段同时处理所有分区以生成层次结构的各个级别,并且不需要全局同步。两阶段剥离进一步实现了批量优化,极大地提高了PBNG的计算效率。我们使用几个真实世界的二分图对PBNG进行了实证评估,并证明了对现有方法的根本改进。在共享内存的36核服务器上,PBNG实现了19.7倍的自相对并行加速。与最先进的并行框架ParButterfly相比,PBNG将同步时间减少了15260倍,执行时间减少了295倍。此外,与专门针对机翼分解调整的最先进算法相比,它实现了高达38.5倍的加速。我们的源代码可在https://github.com/kartiklakhotia/RECEIPT.
{"title":"Parallel Peeling of Bipartite Networks for Hierarchical Dense Subgraph Discovery","authors":"Kartik Lakhotia, R. Kannan, V. Prasanna","doi":"10.1145/3583084","DOIUrl":"https://doi.org/10.1145/3583084","url":null,"abstract":"Wing and Tip decomposition are motif-based analytics for bipartite graphs that construct a hierarchy of butterfly (2,2-biclique) dense edge and vertex induced subgraphs, respectively. They have applications in several domains, including e-commerce, recommendation systems, document analysis, and others. Existing decomposition algorithms use a bottom-up approach that constructs the hierarchy in an increasing order of the subgraph density. They iteratively select the edges or vertices with minimum butterfly count peel, i.e., remove them along with their butterflies. The amount of butterflies in real-world bipartite graphs makes bottom-up peeling computationally demanding. Furthermore, the strict order of peeling entities results in a large number of sequentially dependent iterations. Consequently, parallel algorithms based on bottom up peeling incur heavy synchronization and poor scalability. In this article, we propose a novel Parallel Bipartite Network peelinG (PBNG) framework that adopts a two-phased peeling approach to relax the order of peeling, and in turn, dramatically reduce synchronization. The first phase divides the decomposition hierarchy into few partitions and requires little synchronization. The second phase concurrently processes all partitions to generate individual levels of the hierarchy and requires no global synchronization. The two-phased peeling further enables batching optimizations that dramatically improve the computational efficiency of PBNG. We empirically evaluate PBNG using several real-world bipartite graphs and demonstrate radical improvements over the existing approaches. On a shared-memory 36 core server, PBNG achieves up to 19.7× self-relative parallel speedup. Compared to the state-of-the-art parallel framework ParButterfly, PBNG reduces synchronization by up to 15,260× and execution time by up to 295×. Furthermore, it achieves up to 38.5× speedup over state-of-the-art algorithms specifically tuned for wing decomposition. Our source code is made available at https://github.com/kartiklakhotia/RECEIPT.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2021-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44283113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Metrics and Design of an Instruction Roofline Model for AMD GPUs AMD GPU指令屋顶线模型的度量与设计
IF 1.6 Q2 Computer Science Pub Date : 2021-10-15 DOI: 10.1145/3505285
M. Leinhauser, R. Widera, S. Bastrakov, A. Debus, M. Bussmann, S. Chandrasekaran
Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD (CPU-GPU) architectures, which means moving away from the traditional CPU and NVIDIA-GPU systems. Due to the current limitations of profiling tools for AMD GPUs, this shift leaves a void in how to measure application performance on AMD GPUs. In this article, we design an instruction roofline model for AMD GPUs using AMD’s ROCProfiler and a benchmarking tool, BabelStream (the HIP implementation), as a way to measure an application’s performance in instructions and memory transactions on new AMD hardware. Specifically, we create instruction roofline models for a case study scientific application, PIConGPU, an open source particle-in-cell simulations application used for plasma and laser-plasma physics on the NVIDIA V100, AMD Radeon Instinct MI60, and AMD Instinct MI100 GPUs. When looking at the performance of multiple kernels of interest in PIConGPU we find that although the AMD MI100 GPU achieves a similar, or better, execution time compared to the NVIDIA V100 GPU, profiling tool differences make comparing performance of these two architectures hard. When looking at execution time, GIPS, and instruction intensity, the AMD MI60 achieves the worst performance out of the three GPUs used in this work.
由于最近Frontier超级计算机的发布,许多科学应用程序开发人员正在努力使他们的应用程序与AMD (CPU- gpu)架构兼容,这意味着远离传统的CPU和NVIDIA-GPU系统。由于目前AMD gpu的分析工具的局限性,这种转变在如何测量AMD gpu上的应用程序性能方面留下了空白。在本文中,我们使用AMD的ROCProfiler和基准测试工具BabelStream (HIP实现)为AMD gpu设计了一个指令线模型,作为在新的AMD硬件上测量应用程序在指令和内存事务方面的性能的一种方法。具体来说,我们为一个案例研究科学应用程序PIConGPU创建了指令线模型,PIConGPU是一个开源的粒子单元模拟应用程序,用于NVIDIA V100、AMD Radeon Instinct MI60和AMD Instinct MI100 gpu上的等离子体和激光等离子体物理。当在PIConGPU中查看多个感兴趣的内核的性能时,我们发现尽管AMD MI100 GPU实现了与NVIDIA V100 GPU相似或更好的执行时间,但分析工具的差异使得比较这两种架构的性能变得困难。在查看执行时间、GIPS和指令强度时,AMD MI60在本工作中使用的三种gpu中性能最差。
{"title":"Metrics and Design of an Instruction Roofline Model for AMD GPUs","authors":"M. Leinhauser, R. Widera, S. Bastrakov, A. Debus, M. Bussmann, S. Chandrasekaran","doi":"10.1145/3505285","DOIUrl":"https://doi.org/10.1145/3505285","url":null,"abstract":"Due to the recent announcement of the Frontier supercomputer, many scientific application developers are working to make their applications compatible with AMD (CPU-GPU) architectures, which means moving away from the traditional CPU and NVIDIA-GPU systems. Due to the current limitations of profiling tools for AMD GPUs, this shift leaves a void in how to measure application performance on AMD GPUs. In this article, we design an instruction roofline model for AMD GPUs using AMD’s ROCProfiler and a benchmarking tool, BabelStream (the HIP implementation), as a way to measure an application’s performance in instructions and memory transactions on new AMD hardware. Specifically, we create instruction roofline models for a case study scientific application, PIConGPU, an open source particle-in-cell simulations application used for plasma and laser-plasma physics on the NVIDIA V100, AMD Radeon Instinct MI60, and AMD Instinct MI100 GPUs. When looking at the performance of multiple kernels of interest in PIConGPU we find that although the AMD MI100 GPU achieves a similar, or better, execution time compared to the NVIDIA V100 GPU, profiling tool differences make comparing performance of these two architectures hard. When looking at execution time, GIPS, and instruction intensity, the AMD MI60 achieves the worst performance out of the three GPUs used in this work.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41415302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Introduction to the Special Issue for SPAA 2019 SPAA 2019特刊简介
IF 1.6 Q2 Computer Science Pub Date : 2021-09-20 DOI: 10.1145/3477610
P. Berenbrink
1. Soheil Behnezhad, Laxman Dhulipala, Hossein Esfandiari, Jakub Łącki, Vahab Mirrokni, Warren Schudy: Massively Parallel Computation via Remote Memory Access 2. Faith Ellen, Barun Gorain, Avery Miller, Andrzej Pelc: Constant-Length Labeling Schemes for Deterministic Radio Broadcast 3. Michael A. Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, and Yang Zhan: External-Memory Dictionaries in the Affine and PDAM Models.
1. Soheil Behnezhad, Laxman Dhulipala, Hossein Esfandiari, Jakub Łącki, Vahab Mirrokni, Warren study:基于远程内存访问的大规模并行计算2。Faith Ellen, Barun Gorain, Avery Miller, Andrzej Pelc:确定性无线电广播的等长度标记方案Michael A. Bender, Alex Conway, Martín Farach-Colton, William Jannen,焦义正,Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Yuan, and Yang Zhan:仿射和PDAM模型中的外部记忆字典。
{"title":"Introduction to the Special Issue for SPAA 2019","authors":"P. Berenbrink","doi":"10.1145/3477610","DOIUrl":"https://doi.org/10.1145/3477610","url":null,"abstract":"1. Soheil Behnezhad, Laxman Dhulipala, Hossein Esfandiari, Jakub Łącki, Vahab Mirrokni, Warren Schudy: Massively Parallel Computation via Remote Memory Access 2. Faith Ellen, Barun Gorain, Avery Miller, Andrzej Pelc: Constant-Length Labeling Schemes for Deterministic Radio Broadcast 3. Michael A. Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, and Yang Zhan: External-Memory Dictionaries in the Affine and PDAM Models.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44969161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Massively Parallel Computation via Remote Memory Access 基于远程内存访问的大规模并行计算
IF 1.6 Q2 Computer Science Pub Date : 2021-09-20 DOI: 10.1145/3470631
Soheil Behnezhad, Laxman Dhulipala, Hossein Esfandiari, Jakub Lacki, V. Mirrokni, W. Schudy
We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round, all machines are provided with random read access to the data store, subject to the same constraints on the total amount of communication as in the MPC model. Our model is inspired by the previous empirical studies of distributed graph algorithms [8, 30] using MapReduce and a distributed hash table service [17]. This extension allows us to give new graph algorithms with much lower round complexities compared to the best-known solutions in the MPC model. In particular, in the AMPC model we show how to solve maximal independent set in O(1) rounds and connectivity/minimum spanning tree in O(log logm/n n rounds both using O(nδ) space per machine for constant δ < 1. In the same memory regime for MPC, the best-known algorithms for these problems require poly log n rounds. Our results imply that the 2-CYCLE conjecture, which is widely believed to hold in the MPC model, does not hold in the AMPC model.
我们介绍了自适应大规模并行计算(AMPC)模型,它是大规模并行计算模型的扩展。在高层,AMPC模型通过将一轮中发送的所有消息存储在分布式数据存储中来增强MPC模型。在下一轮中,向所有机器提供对数据存储的随机读取访问,受与MPC模型中相同的通信总量约束。我们的模型受到了之前使用MapReduce和分布式哈希表服务[17]对分布式图算法[8,30]进行的实证研究的启发。与MPC模型中最著名的解决方案相比,这种扩展使我们能够给出新的图算法,其回合复杂性要低得多。特别地,在AMPC模型中,我们展示了如何在常数δ<1的情况下使用每机器的O(nδ)空间来求解O(1)轮中的最大独立集和O(log logm/n n轮中的连通性/最小生成树。在MPC的相同内存机制中,解决这些问题的最著名算法需要多对数n轮。我们的结果表明,被广泛认为在MPC模型中成立的2-CYCLE猜想在AMPC模型中不成立。
{"title":"Massively Parallel Computation via Remote Memory Access","authors":"Soheil Behnezhad, Laxman Dhulipala, Hossein Esfandiari, Jakub Lacki, V. Mirrokni, W. Schudy","doi":"10.1145/3470631","DOIUrl":"https://doi.org/10.1145/3470631","url":null,"abstract":"We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round, all machines are provided with random read access to the data store, subject to the same constraints on the total amount of communication as in the MPC model. Our model is inspired by the previous empirical studies of distributed graph algorithms [8, 30] using MapReduce and a distributed hash table service [17]. This extension allows us to give new graph algorithms with much lower round complexities compared to the best-known solutions in the MPC model. In particular, in the AMPC model we show how to solve maximal independent set in O(1) rounds and connectivity/minimum spanning tree in O(log logm/n n rounds both using O(nδ) space per machine for constant δ < 1. In the same memory regime for MPC, the best-known algorithms for these problems require poly log n rounds. Our results imply that the 2-CYCLE conjecture, which is widely believed to hold in the MPC model, does not hold in the AMPC model.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45449480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Study of Fine-grained Nested Parallelism in CDCL SAT Solvers CDCL SAT解算器中细粒度嵌套并行性的研究
IF 1.6 Q2 Computer Science Pub Date : 2021-09-20 DOI: 10.1145/3470639
J. Edwards, U. Vishkin
Boolean satisfiability (SAT) is an important performance-hungry problem with applications in many problem domains. However, most work on parallelizing SAT solvers has focused on coarse-grained, mostly embarrassing, parallelism. Here, we study fine-grained parallelism that can speed up existing sequential SAT solvers, which all happen to be of the so-called Conflict-Directed Clause Learning variety. We show the potential for speedups of up to 382× across a variety of problem instances. We hope that these results will stimulate future research, particularly with respect to a computer architecture open problem we present.
在许多问题领域中,布尔可满足性(SAT)是一个重要的对性能要求很高的问题。然而,大多数关于并行化SAT求解器的工作都集中在粗粒度的、令人尴尬的并行性上。在这里,我们研究了细粒度的并行性,它可以加速现有的顺序SAT解决方案,这些解决方案都是所谓的冲突导向子句学习。我们展示了在各种问题实例中加速高达382倍的潜力。我们希望这些结果将刺激未来的研究,特别是关于我们提出的计算机体系结构开放问题。
{"title":"Study of Fine-grained Nested Parallelism in CDCL SAT Solvers","authors":"J. Edwards, U. Vishkin","doi":"10.1145/3470639","DOIUrl":"https://doi.org/10.1145/3470639","url":null,"abstract":"Boolean satisfiability (SAT) is an important performance-hungry problem with applications in many problem domains. However, most work on parallelizing SAT solvers has focused on coarse-grained, mostly embarrassing, parallelism. Here, we study fine-grained parallelism that can speed up existing sequential SAT solvers, which all happen to be of the so-called Conflict-Directed Clause Learning variety. We show the potential for speedups of up to 382× across a variety of problem instances. We hope that these results will stimulate future research, particularly with respect to a computer architecture open problem we present.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45095072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
External-memory Dictionaries in the Affine and PDAM Models 仿射和PDAM模型中的外部内存字典
IF 1.6 Q2 Computer Science Pub Date : 2021-09-20 DOI: 10.1145/3470635
M. A. Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, P. Pandey, Donald E. Porter, Jun Yuan, Yang Zhan
Storage devices have complex performance profiles, including costs to initiate IOs (e.g., seek times in hard drives), parallelism and bank conflicts (in SSDs), costs to transfer data, and firmware-internal operations. The Disk-access Machine (DAM) model simplifies reality by assuming that storage devices transfer data in blocks of size B and that all transfers have unit cost. Despite its simplifications, the DAM model is reasonably accurate. In fact, if B is set to the half-bandwidth point, where the latency and bandwidth of the hardware are equal, then the DAM approximates the IO cost on any hardware to within a factor of 2. Furthermore, the DAM model explains the popularity of B-trees in the 1970s and the current popularity of Bɛ-trees and log-structured merge trees. But it fails to explain why some B-trees use small nodes, whereas all Bɛ-trees use large nodes. In a DAM, all IOs, and hence all nodes, are the same size. In this article, we show that the affine and PDAM models, which are small refinements of the DAM model, yield a surprisingly large improvement in predictability without sacrificing ease of use. We present benchmarks on a large collection of storage devices showing that the affine and PDAM models give good approximations of the performance characteristics of hard drives and SSDs, respectively. We show that the affine model explains node-size choices in B-trees and Bɛ-trees. Furthermore, the models predict that B-trees are highly sensitive to variations in the node size, whereas Bɛ-trees are much less sensitive. These predictions are born out empirically. Finally, we show that in both the affine and PDAM models, it pays to organize data structures to exploit varying IO size. In the affine model, Bɛ-trees can be optimized so that all operations are simultaneously optimal, even up to lower-order terms. In the PDAM model, Bɛ-trees (or B-trees) can be organized so that both sequential and concurrent workloads are handled efficiently. We conclude that the DAM model is useful as a first cut when designing or analyzing an algorithm or data structure but the affine and PDAM models enable the algorithm designer to optimize parameter choices and fill in design details.
存储设备具有复杂的性能配置文件,包括启动IOs的成本(例如,硬盘驱动器中的寻道时间)、并行性和银行冲突(在ssd中)、传输数据的成本以及固件内部操作。磁盘访问机(DAM)模型通过假设存储设备以大小为B的块传输数据,并且所有传输都具有单位成本,从而简化了现实。尽管进行了简化,但DAM模型还是相当准确的。实际上,如果将B设置为半带宽点,即硬件的延迟和带宽相等,则DAM将任何硬件上的IO成本近似为2倍以内。此外,DAM模型解释了20世纪70年代B树的流行以及当前B树和日志结构合并树的流行。但它无法解释为什么有些B树使用小节点,而所有B树都使用大节点。在DAM中,所有IOs以及所有节点的大小都是相同的。在本文中,我们展示了仿射模型和PDAM模型,它们是DAM模型的小改进,在不牺牲易用性的情况下,在可预测性方面产生了惊人的巨大改进。我们提供了大量存储设备的基准测试,表明仿射和PDAM模型分别很好地近似了硬盘驱动器和ssd的性能特征。我们证明了仿射模型解释了B树和B树的节点大小选择。此外,模型预测B树对节点大小的变化高度敏感,而B树则不那么敏感。这些预测都是凭经验得出的。最后,我们证明了在仿射和PDAM模型中,组织数据结构以利用不同的IO大小是值得的。在仿射模型中,B树可以被优化,使所有的操作同时是最优的,即使是低阶项。在PDAM模型中,可以组织B树(或B树),以便有效地处理顺序和并发工作负载。我们得出结论,DAM模型在设计或分析算法或数据结构时是有用的,但仿射和PDAM模型使算法设计者能够优化参数选择和填充设计细节。
{"title":"External-memory Dictionaries in the Affine and PDAM Models","authors":"M. A. Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, P. Pandey, Donald E. Porter, Jun Yuan, Yang Zhan","doi":"10.1145/3470635","DOIUrl":"https://doi.org/10.1145/3470635","url":null,"abstract":"Storage devices have complex performance profiles, including costs to initiate IOs (e.g., seek times in hard drives), parallelism and bank conflicts (in SSDs), costs to transfer data, and firmware-internal operations. The Disk-access Machine (DAM) model simplifies reality by assuming that storage devices transfer data in blocks of size B and that all transfers have unit cost. Despite its simplifications, the DAM model is reasonably accurate. In fact, if B is set to the half-bandwidth point, where the latency and bandwidth of the hardware are equal, then the DAM approximates the IO cost on any hardware to within a factor of 2. Furthermore, the DAM model explains the popularity of B-trees in the 1970s and the current popularity of Bɛ-trees and log-structured merge trees. But it fails to explain why some B-trees use small nodes, whereas all Bɛ-trees use large nodes. In a DAM, all IOs, and hence all nodes, are the same size. In this article, we show that the affine and PDAM models, which are small refinements of the DAM model, yield a surprisingly large improvement in predictability without sacrificing ease of use. We present benchmarks on a large collection of storage devices showing that the affine and PDAM models give good approximations of the performance characteristics of hard drives and SSDs, respectively. We show that the affine model explains node-size choices in B-trees and Bɛ-trees. Furthermore, the models predict that B-trees are highly sensitive to variations in the node size, whereas Bɛ-trees are much less sensitive. These predictions are born out empirically. Finally, we show that in both the affine and PDAM models, it pays to organize data structures to exploit varying IO size. In the affine model, Bɛ-trees can be optimized so that all operations are simultaneously optimal, even up to lower-order terms. In the PDAM model, Bɛ-trees (or B-trees) can be organized so that both sequential and concurrent workloads are handled efficiently. We conclude that the DAM model is useful as a first cut when designing or analyzing an algorithm or data structure but the affine and PDAM models enable the algorithm designer to optimize parameter choices and fill in design details.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48435982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient Parallel 3D Computation of the Compressible Euler Equations with an Invariant-domain Preserving Second-order Finite-element Scheme 具有保持不变域的二阶有限元格式的可压缩欧拉方程的高效并行三维计算
IF 1.6 Q2 Computer Science Pub Date : 2021-09-20 DOI: 10.1145/3470637
M. Maier, M. Kronbichler
We discuss the efficient implementation of a high-performance second-order collocation-type finite-element scheme for solving the compressible Euler equations of gas dynamics on unstructured meshes. The solver is based on the convex-limiting technique introduced by Guermond et al. (SIAM J. Sci. Comput. 40, A3211–A3239, 2018). As such, it is invariant-domain preserving; i.e., the solver maintains important physical invariants and is guaranteed to be stable without the use of ad hoc tuning parameters. This stability comes at the expense of a significantly more involved algorithmic structure that renders conventional high-performance discretizations challenging. We develop an algorithmic design that allows SIMD vectorization of the compute kernel, identify the main ingredients for a good node-level performance, and report excellent weak and strong scaling of a hybrid thread/MPI parallelization.
讨论了求解非结构网格上气体动力学可压缩欧拉方程的高性能二阶配位型有限元格式的有效实现。该求解器基于Guermond et al. (SIAM J. Sci.)引入的凸极限技术。计算机学报。40,A3211-A3239, 2018)。因此,它是保持不变域的;即,求解器保持重要的物理不变量,并保证在不使用特别调优参数的情况下保持稳定。这种稳定性是以更复杂的算法结构为代价的,这使得传统的高性能离散化具有挑战性。我们开发了一种算法设计,允许计算内核的SIMD矢量化,确定了良好节点级性能的主要成分,并报告了混合线程/MPI并行化的优秀弱缩放和强缩放。
{"title":"Efficient Parallel 3D Computation of the Compressible Euler Equations with an Invariant-domain Preserving Second-order Finite-element Scheme","authors":"M. Maier, M. Kronbichler","doi":"10.1145/3470637","DOIUrl":"https://doi.org/10.1145/3470637","url":null,"abstract":"We discuss the efficient implementation of a high-performance second-order collocation-type finite-element scheme for solving the compressible Euler equations of gas dynamics on unstructured meshes. The solver is based on the convex-limiting technique introduced by Guermond et al. (SIAM J. Sci. Comput. 40, A3211–A3239, 2018). As such, it is invariant-domain preserving; i.e., the solver maintains important physical invariants and is guaranteed to be stable without the use of ad hoc tuning parameters. This stability comes at the expense of a significantly more involved algorithmic structure that renders conventional high-performance discretizations challenging. We develop an algorithmic design that allows SIMD vectorization of the compute kernel, identify the main ingredients for a good node-level performance, and report excellent weak and strong scaling of a hybrid thread/MPI parallelization.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44122145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Constant-Length Labeling Schemes for Deterministic Radio Broadcast 确定性无线电广播的定长标记方案
IF 1.6 Q2 Computer Science Pub Date : 2021-09-20 DOI: 10.1145/3470633
Faith Ellen, B. Gorain, Avery Miller, A. Pelc
Broadcast is one of the fundamental network communication primitives. One node of a network, called the source, has a message that has to be learned by all other nodes. We consider broadcast in radio networks, modeled as simple undirected connected graphs with a distinguished source. Nodes communicate in synchronous rounds. In each round, a node can either transmit a message to all its neighbours, or stay silent and listen. At the receiving end, a node v hears a message from a neighbour w in a given round if v listens in this round and if w is its only neighbour that transmits in this round. If more than one neighbour of a node v transmits in a given round, we say that a collision occurs at v. We do not assume collision detection: in case of a collision, node v does not hear anything (except the background noise that it also hears when no neighbour transmits). We are interested in the feasibility of deterministic broadcast in radio networks. If nodes of the network do not have any labels, deterministic broadcast is impossible even in the four-cycle. On the other hand, if all nodes have distinct labels, then broadcast can be carried out, e.g., in a round-robin fashion, and hence O(log n)-bit labels are sufficient for this task in n-node networks. In fact, O(log Δ)-bit labels, where Δ is the maximum degree, are enough to broadcast successfully. Hence, it is natural to ask if very short labels are sufficient for broadcast. Our main result is a positive answer to this question. We show that every radio network can be labeled using 2 bits in such a way that broadcast can be accomplished by some universal deterministic algorithm that does not know the network topology nor any bound on its size. Moreover, at the expense of an extra bit in the labels, we can get the following additional strong property of our algorithm: there exists a common round in which all nodes know that broadcast has been completed. Finally, we show that 3-bit labels are also sufficient to solve both versions of broadcast in the case where it is not known a priori which node is the source.
广播是最基本的网络通信原语之一。网络中的一个节点,称为源,有一条消息必须被所有其他节点学习。我们考虑无线网络中的广播,将其建模为具有区分源的简单无向连接图。节点以同步轮进行通信。在每一轮中,一个节点可以向其所有邻居发送消息,也可以保持沉默并听取消息。在接收端,如果v在此轮中侦听并且w是其唯一在此轮中发送的邻居,则节点v在给定的轮中收到邻居w的消息。如果节点v的多个邻居在给定的一轮中传输,我们说在v处发生了碰撞。我们不假设碰撞检测:在发生碰撞的情况下,节点v听不到任何东西(除了没有邻居传输时它也听到的背景噪声)。我们对无线网络中确定性广播的可行性很感兴趣。如果网络节点没有任何标签,即使在四个周期内也不可能进行确定性广播。另一方面,如果所有节点都有不同的标签,那么广播就可以进行,例如,以轮询的方式,因此在n节点网络中,O(log n)位标签就足以完成这项任务。事实上,O(log Δ)位标签(其中Δ是最大度)足以成功广播。因此,很自然地要问,非常短的标签是否足以播放。我们的主要结果是对这个问题的肯定回答。我们证明了每个无线网络都可以用2位标记,这样广播就可以通过一些不知道网络拓扑也不知道其大小的任何界限的通用确定性算法来完成。此外,以标签中额外的一位为代价,我们可以得到我们算法的以下附加强性质:存在一个所有节点都知道广播已经完成的公共轮。最后,我们表明,在先验不知道哪个节点是源的情况下,3位标签也足以解决两个版本的广播。
{"title":"Constant-Length Labeling Schemes for Deterministic Radio Broadcast","authors":"Faith Ellen, B. Gorain, Avery Miller, A. Pelc","doi":"10.1145/3470633","DOIUrl":"https://doi.org/10.1145/3470633","url":null,"abstract":"Broadcast is one of the fundamental network communication primitives. One node of a network, called the source, has a message that has to be learned by all other nodes. We consider broadcast in radio networks, modeled as simple undirected connected graphs with a distinguished source. Nodes communicate in synchronous rounds. In each round, a node can either transmit a message to all its neighbours, or stay silent and listen. At the receiving end, a node v hears a message from a neighbour w in a given round if v listens in this round and if w is its only neighbour that transmits in this round. If more than one neighbour of a node v transmits in a given round, we say that a collision occurs at v. We do not assume collision detection: in case of a collision, node v does not hear anything (except the background noise that it also hears when no neighbour transmits). We are interested in the feasibility of deterministic broadcast in radio networks. If nodes of the network do not have any labels, deterministic broadcast is impossible even in the four-cycle. On the other hand, if all nodes have distinct labels, then broadcast can be carried out, e.g., in a round-robin fashion, and hence O(log n)-bit labels are sufficient for this task in n-node networks. In fact, O(log Δ)-bit labels, where Δ is the maximum degree, are enough to broadcast successfully. Hence, it is natural to ask if very short labels are sufficient for broadcast. Our main result is a positive answer to this question. We show that every radio network can be labeled using 2 bits in such a way that broadcast can be accomplished by some universal deterministic algorithm that does not know the network topology nor any bound on its size. Moreover, at the expense of an extra bit in the labels, we can get the following additional strong property of our algorithm: there exists a common round in which all nodes know that broadcast has been completed. Finally, we show that 3-bit labels are also sufficient to solve both versions of broadcast in the case where it is not known a priori which node is the source.","PeriodicalId":42115,"journal":{"name":"ACM Transactions on Parallel Computing","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2021-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44815158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
ACM Transactions on Parallel Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1