首页 > 最新文献

2014 IEEE 28th International Parallel and Distributed Processing Symposium最新文献

英文 中文
Shedding Light on Lithium/Air Batteries Using Millions of Threads on the BG/Q Supercomputer 利用BG/Q超级计算机上的数百万线程揭示锂/空气电池
Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.81
V. Weber, C. Bekas, T. Laino, A. Curioni, A. Bertsch, S. Futral
In this work, we present a novel parallelization scheme for a highly efficient evaluation of the Hartree-Fock exact exchange (HFX) in ab initio molecular dynamics simulations, specifically tailored for condensed phase simulations. Our developments allow one to achieve the necessary accuracy for the evaluation of the HFX in a highly controllable manner. We show here that our solutions can take great advantage of the latest trends in HPC platforms, such as extreme threading, short vector instructions and highly dimensional interconnection networks. Indeed, all these trends are evident in the IBM Blue Gene/Q supercomputer. We demonstrate an unprecedented scalability up to 6,291,456 threads (96 BG/Q racks) with a near perfect parallel efficiency, which represents a more than 20-fold improvement as compared to the current state of the art. In terms of reduction of time to solution, we achieved an improvement that can surpass a 10-fold decrease in runtime with respect to directly comparable approaches. We exploit this development to enhance the accuracy of DFT based molecular dynamics by using the PBE0 hybrid functional. This approach allowed us to investigate the chemical behavior of organic solvents in one of the most challenging research topics in energy storage, lithium/air batteries, and to propose alternative solvents with enhanced stability to ensure an appropriate reversible electrochemical reaction. This step is key for the development of a viable lithium/air storage technology, which would have been a daunting computational task using standard methods. Recent research has shown that the electrolyte plays a key role in non-aqueous lithium/air batteries in producing the appropriate reversible electrochemical reduction. In particular, the chemical degradation of propylene carbonate, the typical electrolyte used, by lithium peroxide has been demonstrated by molecular dynamics simulations of highly realistic models. Reaching the necessary high accuracy in these simulations is a daunting computational task using standard methods.
在这项工作中,我们提出了一种新的并行方案,用于从头算分子动力学模拟中Hartree-Fock精确交换(HFX)的高效评估,专门为缩合相模拟量身定制。我们的发展使人们能够以高度可控的方式实现对HFX评估的必要准确性。我们在这里展示了我们的解决方案可以充分利用高性能计算平台的最新趋势,如极端线程、短向量指令和高维互连网络。事实上,所有这些趋势在IBM的蓝色基因/Q超级计算机上都很明显。我们展示了前所未有的可扩展性,最多可达6,291,456个线程(96个BG/Q机架),并行效率接近完美,与目前的技术水平相比,这代表了20倍以上的改进。在减少解决方案的时间方面,我们实现了一个改进,与直接可比较的方法相比,可以在运行时减少10倍以上。我们利用这一发展,通过使用PBE0混合泛函来提高基于DFT的分子动力学的准确性。这种方法使我们能够研究有机溶剂在锂/空气电池中最具挑战性的研究课题之一中的化学行为,并提出具有增强稳定性的替代溶剂,以确保适当的可逆电化学反应。这一步是开发可行的锂/空气存储技术的关键,如果使用标准方法,这将是一项艰巨的计算任务。近年来的研究表明,在非水锂/空气电池中,电解质在产生适当的可逆电化学还原方面起着关键作用。特别是,过氧化锂对典型电解质碳酸丙烯酯的化学降解已经通过高度真实模型的分子动力学模拟得到了证明。使用标准方法在这些模拟中达到必要的高精度是一项艰巨的计算任务。
{"title":"Shedding Light on Lithium/Air Batteries Using Millions of Threads on the BG/Q Supercomputer","authors":"V. Weber, C. Bekas, T. Laino, A. Curioni, A. Bertsch, S. Futral","doi":"10.1109/IPDPS.2014.81","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.81","url":null,"abstract":"In this work, we present a novel parallelization scheme for a highly efficient evaluation of the Hartree-Fock exact exchange (HFX) in ab initio molecular dynamics simulations, specifically tailored for condensed phase simulations. Our developments allow one to achieve the necessary accuracy for the evaluation of the HFX in a highly controllable manner. We show here that our solutions can take great advantage of the latest trends in HPC platforms, such as extreme threading, short vector instructions and highly dimensional interconnection networks. Indeed, all these trends are evident in the IBM Blue Gene/Q supercomputer. We demonstrate an unprecedented scalability up to 6,291,456 threads (96 BG/Q racks) with a near perfect parallel efficiency, which represents a more than 20-fold improvement as compared to the current state of the art. In terms of reduction of time to solution, we achieved an improvement that can surpass a 10-fold decrease in runtime with respect to directly comparable approaches. We exploit this development to enhance the accuracy of DFT based molecular dynamics by using the PBE0 hybrid functional. This approach allowed us to investigate the chemical behavior of organic solvents in one of the most challenging research topics in energy storage, lithium/air batteries, and to propose alternative solvents with enhanced stability to ensure an appropriate reversible electrochemical reaction. This step is key for the development of a viable lithium/air storage technology, which would have been a daunting computational task using standard methods. Recent research has shown that the electrolyte plays a key role in non-aqueous lithium/air batteries in producing the appropriate reversible electrochemical reduction. In particular, the chemical degradation of propylene carbonate, the typical electrolyte used, by lithium peroxide has been demonstrated by molecular dynamics simulations of highly realistic models. Reaching the necessary high accuracy in these simulations is a daunting computational task using standard methods.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130792369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Characterization and Optimization of Memory-Resident MapReduce on HPC Systems 高性能计算系统中驻留内存MapReduce的特性与优化
Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.87
Yandong Wang, R. Goldstone, Weikuan Yu, Teng Wang
MapReduce is a widely accepted framework for addressing big data challenges. Recently, it has also gained broad attention from scientists at the U.S. leadership computing facilities as a promising solution to process gigantic simulation results. However, conventional high-end computing systems are constructed based on the compute-centric paradigm while big data analytics applications prefer a data-centric paradigm such as MapReduce. This work characterizes the performance impact of key differences between compute- and data-centric paradigms and then provides optimizations to enable a dual-purpose HPC system that can efficiently support conventional HPC applications and new data analytics applications. Using a state-of-the-art MapReduce implementation Spark and the Hyperion system at Lawrence Livermore National Laboratory, we have examined the impact of storage architectures, data locality and task scheduling to the memory-resident MapReduce jobs. Based on our characterization and findings of the performance behaviors, we have introduced two optimization techniques, namely Enhanced Load Balancer and Congestion-Aware Task Dispatching, to improve the performance of Spark applications.
MapReduce是一个被广泛接受的解决大数据挑战的框架。最近,它也得到了美国领先计算设施科学家的广泛关注,因为它是处理巨大模拟结果的有前途的解决方案。然而,传统的高端计算系统是基于以计算为中心的范式构建的,而大数据分析应用更喜欢以数据为中心的范式,比如MapReduce。这项工作描述了以计算为中心和以数据为中心的范式之间的关键差异对性能的影响,然后提供了优化,使双重用途的HPC系统能够有效地支持传统的HPC应用程序和新的数据分析应用程序。使用最先进的MapReduce实现Spark和Lawrence Livermore国家实验室的Hyperion系统,我们研究了存储架构、数据位置和任务调度对内存驻留MapReduce作业的影响。基于我们对性能行为的描述和发现,我们引入了两种优化技术,即Enhanced Load Balancer和拥塞感知任务调度,以提高Spark应用程序的性能。
{"title":"Characterization and Optimization of Memory-Resident MapReduce on HPC Systems","authors":"Yandong Wang, R. Goldstone, Weikuan Yu, Teng Wang","doi":"10.1109/IPDPS.2014.87","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.87","url":null,"abstract":"MapReduce is a widely accepted framework for addressing big data challenges. Recently, it has also gained broad attention from scientists at the U.S. leadership computing facilities as a promising solution to process gigantic simulation results. However, conventional high-end computing systems are constructed based on the compute-centric paradigm while big data analytics applications prefer a data-centric paradigm such as MapReduce. This work characterizes the performance impact of key differences between compute- and data-centric paradigms and then provides optimizations to enable a dual-purpose HPC system that can efficiently support conventional HPC applications and new data analytics applications. Using a state-of-the-art MapReduce implementation Spark and the Hyperion system at Lawrence Livermore National Laboratory, we have examined the impact of storage architectures, data locality and task scheduling to the memory-resident MapReduce jobs. Based on our characterization and findings of the performance behaviors, we have introduced two optimization techniques, namely Enhanced Load Balancer and Congestion-Aware Task Dispatching, to improve the performance of Spark applications.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131553743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Fair Maximal Independent Sets 公平极大独立集
Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.79
Jeremy T. Fineman, Calvin C. Newport, M. Sherr, Tonghe Wang
Finding a maximal independent set (MIS) is a classic problem in graph theory that has been widely studied in the context of distributed algorithms. Standard distributed solutions to the MIS problem focus on time complexity. In this paper, we also consider fairness. For a given MIS algorithm A and graph G, we define the inequality factor for A on G to be the largest ratio between the probabilities of the nodes joining an MIS in the graph. We say an algorithm is fair with respect to a family of graphs if it achieves a constant inequality factor for all graphs in the family. In this paper, we seek efficient and fair algorithms for common graph families. We begin by describing an algorithm that is fair and runs in O(log* n)-time in rooted trees of size n. Moving to unrooted trees, we describe a fair algorithm that runs in O(log n) time. Generalizing further to bipartite graphs, we describe a third fair algorithm that requires O(log2 n) rounds. We also show a fair algorithm for planar graphs that runs in O(log2 n) rounds, and describe an algorithm that can be run in any graph, yielding good bounds on inequality in regions that can be efficiently colored with a small number of colors. We conclude our theoretical analysis with a lower bound that identifies a graph where all MIS algorithms achieve an inequality bound in Ω(n)-eliminating the possibility of an MIS algorithm that is fair in all graphs. Finally, to motivate the need for provable fairness guarantees, we simulate both our tree algorithm and Luby's MIS algorithm [13] in a variety of different tree topologies-some synthetic and some derived from real world data. Whereas our algorithm always yield an inequality factor ≤3.25 in these simulations, Luby's algorithms yields factors as large as 168.
最大独立集(MIS)是图论中的一个经典问题,在分布式算法中得到了广泛的研究。管理信息系统问题的标准分布式解决方案侧重于时间复杂性。在本文中,我们还考虑了公平性。对于给定的MIS算法a和图G,我们将a在G上的不等式因子定义为图中加入MIS的节点的概率之间的最大比值。我们说一个算法对于图族是公平的,如果它对族中的所有图都达到一个常数不等式因子。在本文中,我们寻求对常见图族有效且公平的算法。我们首先描述一个公平的算法,在大小为n的有根树中运行时间为O(log* n)。转到无根树,我们描述一个运行时间为O(log n)的公平算法。进一步推广到二部图,我们描述了第三种公平算法,它需要O(log2 n)轮。我们还展示了一个在O(log2 n)轮内运行的平面图的公平算法,并描述了一个可以在任何图中运行的算法,在可以有效地用少量颜色着色的区域上产生良好的不等式边界。我们用一个下界来总结我们的理论分析,该下界识别了所有MIS算法在Ω(n)中达到不等式界的图-消除了MIS算法在所有图中都是公平的可能性。最后,为了激发对可证明公平性保证的需求,我们在各种不同的树拓扑中模拟了我们的树算法和Luby的MIS算法[13]——一些是合成的,一些是从现实世界的数据中派生出来的。在这些模拟中,我们的算法总是产生一个≤3.25的不等式因子,而Luby的算法产生的因子高达168。
{"title":"Fair Maximal Independent Sets","authors":"Jeremy T. Fineman, Calvin C. Newport, M. Sherr, Tonghe Wang","doi":"10.1109/IPDPS.2014.79","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.79","url":null,"abstract":"Finding a maximal independent set (MIS) is a classic problem in graph theory that has been widely studied in the context of distributed algorithms. Standard distributed solutions to the MIS problem focus on time complexity. In this paper, we also consider fairness. For a given MIS algorithm A and graph G, we define the inequality factor for A on G to be the largest ratio between the probabilities of the nodes joining an MIS in the graph. We say an algorithm is fair with respect to a family of graphs if it achieves a constant inequality factor for all graphs in the family. In this paper, we seek efficient and fair algorithms for common graph families. We begin by describing an algorithm that is fair and runs in O(log* n)-time in rooted trees of size n. Moving to unrooted trees, we describe a fair algorithm that runs in O(log n) time. Generalizing further to bipartite graphs, we describe a third fair algorithm that requires O(log2 n) rounds. We also show a fair algorithm for planar graphs that runs in O(log2 n) rounds, and describe an algorithm that can be run in any graph, yielding good bounds on inequality in regions that can be efficiently colored with a small number of colors. We conclude our theoretical analysis with a lower bound that identifies a graph where all MIS algorithms achieve an inequality bound in Ω(n)-eliminating the possibility of an MIS algorithm that is fair in all graphs. Finally, to motivate the need for provable fairness guarantees, we simulate both our tree algorithm and Luby's MIS algorithm [13] in a variety of different tree topologies-some synthetic and some derived from real world data. Whereas our algorithm always yield an inequality factor ≤3.25 in these simulations, Luby's algorithms yields factors as large as 168.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121171701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Coprocessor Sharing-Aware Scheduler for Xeon Phi-Based Compute Clusters 基于Xeon phi的计算集群的协处理器共享感知调度器
Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.44
G. Coviello, S. Cadambi, S. Chakradhar
We propose a cluster scheduling technique for compute clusters with Xeon Phi coprocessors. Even though the Xeon Phi runs Linux which allows multiprocessing, cluster schedulers generally do not allow jobs to share coprocessors because sharing can cause oversubscription of coprocessor memory and thread resources. It has been shown that memory or thread oversubscription on a many core like the Phi results in job crashes or drastic performance loss. We first show that such an exclusive device allocation policy causes severe coprocessor underutilization: for typical workloads, on average only 38% of the Xeon Phi cores are busy across the cluster. Then, to improve coprocessor utilization, we propose a scheduling technique that enables safe coprocessor sharing without resource oversubscription. Jobs specify their maximum memory and thread requirements, and our scheduler packs as many jobs as possible on each coprocessor in the cluster, subject to resource limits. We solve this problem using a greedy approach at the cluster level combined with a knapsack-based algorithm for each node. Every coprocessor is modeled as a knapsack and jobs are packed into each knapsack with the goal of maximizing job concurrency, i.e., as many jobs as possible executing on each coprocessor. Given a set of jobs, we show that this strategy of packing for high concurrency is a good proxy for (i) reducing make span, without the need for users to specify job execution times and (ii) reducing coprocessor footprint, or the number of coprocessors required to finish the jobs without increasing make span. We implement the entire system as a seamless add on to Condor, a popular distributed job scheduler, and show make span and footprint reductions of more than 50% across a wide range of workloads.
提出了一种基于Xeon Phi协处理器的集群调度技术。尽管Xeon Phi运行的Linux允许多处理,但集群调度器通常不允许作业共享协处理器,因为共享可能导致协处理器内存和线程资源的过度订阅。研究表明,在像Phi这样的多核上,内存或线程过度订阅会导致作业崩溃或严重的性能损失。我们首先表明,这种排他的设备分配策略会导致严重的协处理器利用率不足:对于典型的工作负载,平均只有38%的Xeon Phi内核在整个集群中处于繁忙状态。然后,为了提高协处理器的利用率,我们提出了一种调度技术,可以实现安全的协处理器共享,而不会导致资源超支。作业指定它们的最大内存和线程需求,我们的调度器在资源限制的情况下,在集群中的每个协处理器上打包尽可能多的作业。我们在集群级别使用贪婪方法并结合每个节点的基于背包的算法来解决这个问题。每个协处理器都被建模为一个背包,作业被打包到每个背包中,目的是最大化作业并发性,即在每个协处理器上执行尽可能多的作业。给定一组作业,我们表明,这种针对高并发性的打包策略是(i)减少make span(不需要用户指定作业执行时间)和(ii)减少协处理器占用空间,或在不增加make span的情况下完成作业所需的协处理器数量的良好代理。我们将整个系统无缝地添加到Condor(一种流行的分布式作业调度器)上,并显示在各种工作负载下,make span和footprint减少了50%以上。
{"title":"A Coprocessor Sharing-Aware Scheduler for Xeon Phi-Based Compute Clusters","authors":"G. Coviello, S. Cadambi, S. Chakradhar","doi":"10.1109/IPDPS.2014.44","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.44","url":null,"abstract":"We propose a cluster scheduling technique for compute clusters with Xeon Phi coprocessors. Even though the Xeon Phi runs Linux which allows multiprocessing, cluster schedulers generally do not allow jobs to share coprocessors because sharing can cause oversubscription of coprocessor memory and thread resources. It has been shown that memory or thread oversubscription on a many core like the Phi results in job crashes or drastic performance loss. We first show that such an exclusive device allocation policy causes severe coprocessor underutilization: for typical workloads, on average only 38% of the Xeon Phi cores are busy across the cluster. Then, to improve coprocessor utilization, we propose a scheduling technique that enables safe coprocessor sharing without resource oversubscription. Jobs specify their maximum memory and thread requirements, and our scheduler packs as many jobs as possible on each coprocessor in the cluster, subject to resource limits. We solve this problem using a greedy approach at the cluster level combined with a knapsack-based algorithm for each node. Every coprocessor is modeled as a knapsack and jobs are packed into each knapsack with the goal of maximizing job concurrency, i.e., as many jobs as possible executing on each coprocessor. Given a set of jobs, we show that this strategy of packing for high concurrency is a good proxy for (i) reducing make span, without the need for users to specify job execution times and (ii) reducing coprocessor footprint, or the number of coprocessors required to finish the jobs without increasing make span. We implement the entire system as a seamless add on to Condor, a popular distributed job scheduler, and show make span and footprint reductions of more than 50% across a wide range of workloads.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122016194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
It's About Time: On Optimal Virtual Network Embeddings under Temporal Flexibilities 时间的问题:时间灵活性下的最优虚拟网络嵌入
Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.14
Matthias Rost, S. Schmid, A. Feldmann
Distributed applications often require high-performance networks with strict connectivity guarantees. For instance, many cloud applications suffer from today's variations of the intra-cloud bandwidth, which leads to poor and unpredictable application performance. Accordingly, we witness a trend towards virtual networks (VNets) which can provide resource isolation. Interestingly, while the problem of where to embed a VNet is fairly well-understood today, much less is known about when to optimally allocate a VNet. This however is important, as the requirements specified for a VNet do not have to be static, but can vary over time and even include certain temporal flexibilities. This paper initiates the study of the temporal VNet embedding problem (TVNEP). We propose a continuous-time mathematical programming approach to solve the TVNEP, and present and compare different algorithms. Based on these insights, we present the CSM-Model which incorporates both symmetry and state-space reductions to significantly speed up the process of computing exact solutions to the TVNEP. Based on the CSM-Model, we derive a greedy algorithm OGA to compute fast approximate solutions. In an extensive computational evaluation, we show that despite the hardness of the TVNEP, the CSM-Model is sufficiently powerful to solve moderately sized instances to optimality within one hour and under different objective functions (such as maximizing the number of embeddable VNets). We also show that the greedy algorithm exploits flexibilities well and yields good solutions. More generally, our results suggest that already little time flexibilities can improve the overall system performance significantly.
分布式应用程序通常需要具有严格连接保证的高性能网络。例如,许多云应用程序受到当今云内带宽变化的影响,这导致应用程序性能差且不可预测。因此,我们看到了一种趋势,即虚拟网络(VNets)可以提供资源隔离。有趣的是,虽然在哪里嵌入VNet的问题在今天已经得到了很好的理解,但对于何时最佳地分配VNet却知之甚少。然而,这一点很重要,因为为VNet指定的需求不一定是静态的,而是可以随时间变化,甚至包括某些时间灵活性。本文首先研究了时态VNet嵌入问题(TVNEP)。我们提出了一种求解TVNEP的连续时间数学规划方法,并对不同的算法进行了比较。基于这些见解,我们提出了包含对称性和状态空间约简的csm模型,以显着加快计算TVNEP精确解的过程。在csm模型的基础上,提出了一种快速求解近似解的贪心算法OGA。在广泛的计算评估中,我们表明,尽管TVNEP很困难,但cms - model足够强大,可以在不同的目标函数(如最大化可嵌入vnet的数量)下,在一小时内解决中等规模的实例的最优性。结果表明,贪心算法充分利用了算法的灵活性,得到了较好的解。更一般地说,我们的结果表明,很少的时间灵活性可以显著提高整个系统的性能。
{"title":"It's About Time: On Optimal Virtual Network Embeddings under Temporal Flexibilities","authors":"Matthias Rost, S. Schmid, A. Feldmann","doi":"10.1109/IPDPS.2014.14","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.14","url":null,"abstract":"Distributed applications often require high-performance networks with strict connectivity guarantees. For instance, many cloud applications suffer from today's variations of the intra-cloud bandwidth, which leads to poor and unpredictable application performance. Accordingly, we witness a trend towards virtual networks (VNets) which can provide resource isolation. Interestingly, while the problem of where to embed a VNet is fairly well-understood today, much less is known about when to optimally allocate a VNet. This however is important, as the requirements specified for a VNet do not have to be static, but can vary over time and even include certain temporal flexibilities. This paper initiates the study of the temporal VNet embedding problem (TVNEP). We propose a continuous-time mathematical programming approach to solve the TVNEP, and present and compare different algorithms. Based on these insights, we present the CSM-Model which incorporates both symmetry and state-space reductions to significantly speed up the process of computing exact solutions to the TVNEP. Based on the CSM-Model, we derive a greedy algorithm OGA to compute fast approximate solutions. In an extensive computational evaluation, we show that despite the hardness of the TVNEP, the CSM-Model is sufficiently powerful to solve moderately sized instances to optimality within one hour and under different objective functions (such as maximizing the number of embeddable VNets). We also show that the greedy algorithm exploits flexibilities well and yields good solutions. More generally, our results suggest that already little time flexibilities can improve the overall system performance significantly.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123472117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment 使用轻量级运行时环境进行混合多gpu和多协处理器环境的统一开发
Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.58
A. Haidar, Chongxiao Cao, A. YarKhan, P. Luszczek, S. Tomov, K. Kabir, J. Dongarra
Many of the heterogeneous resources available to modern computers are designed for different workloads. In order to efficiently use GPU resources, the workload must have a greater degree of parallelism than a workload designed for multicore-CPUs. And conceptually, the Intel Xeon Phi coprocessors are capable of handling workloads somewhere in between the two. This multitude of applicable workloads will likely lead to mixing multicore-CPUs, GPUs, and Intel coprocessors in multi-user environments that must offer adequate computing facilities for a wide range of workloads. In this work, we are using a lightweight runtime environment to manage the resource-specific workload, and to control the dataflow and parallel execution in two-way hybrid systems. The lightweight runtime environment uses task superscalar concepts to enable the developer to write serial code while providing parallel execution. In addition, our task abstractions enable unified algorithmic development across all the heterogeneous resources. We provide performance results for dense linear algebra applications, demonstrating the effectiveness of our approach and full utilization of a wide variety of accelerator hardware.
现代计算机可用的许多异构资源都是为不同的工作负载设计的。为了有效地使用GPU资源,工作负载必须比为多核cpu设计的工作负载具有更高程度的并行性。从概念上讲,英特尔至强协处理器能够处理介于两者之间的工作负载。如此多的可应用工作负载可能导致在多用户环境中混合使用多核cpu、gpu和Intel协处理器,这些环境必须为广泛的工作负载提供足够的计算设施。在这项工作中,我们使用轻量级运行时环境来管理特定于资源的工作负载,并在双向混合系统中控制数据流和并行执行。轻量级运行时环境使用任务超标量概念,使开发人员能够编写串行代码,同时提供并行执行。此外,我们的任务抽象支持跨所有异构资源的统一算法开发。我们提供了密集线性代数应用程序的性能结果,证明了我们的方法的有效性和各种加速器硬件的充分利用。
{"title":"Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment","authors":"A. Haidar, Chongxiao Cao, A. YarKhan, P. Luszczek, S. Tomov, K. Kabir, J. Dongarra","doi":"10.1109/IPDPS.2014.58","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.58","url":null,"abstract":"Many of the heterogeneous resources available to modern computers are designed for different workloads. In order to efficiently use GPU resources, the workload must have a greater degree of parallelism than a workload designed for multicore-CPUs. And conceptually, the Intel Xeon Phi coprocessors are capable of handling workloads somewhere in between the two. This multitude of applicable workloads will likely lead to mixing multicore-CPUs, GPUs, and Intel coprocessors in multi-user environments that must offer adequate computing facilities for a wide range of workloads. In this work, we are using a lightweight runtime environment to manage the resource-specific workload, and to control the dataflow and parallel execution in two-way hybrid systems. The lightweight runtime environment uses task superscalar concepts to enable the developer to write serial code while providing parallel execution. In addition, our task abstractions enable unified algorithmic development across all the heterogeneous resources. We provide performance results for dense linear algebra applications, demonstrating the effectiveness of our approach and full utilization of a wide variety of accelerator hardware.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126239849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters 克服蓝色水域流行病模拟的可扩展性挑战
Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.83
Jae-Seung Yeom, A. Bhatele, K. Bisset, Eric J. Bohm, Abhishek K. Gupta, L. Kalé, M. Marathe, Dimitrios S. Nikolopoulos, M. Schulz, Lukasz Wesolowski
Modeling dynamical systems represents an important application class covering a wide range of disciplines including but not limited to biology, chemistry, finance, national security, and health care. Such applications typically involve large-scale, irregular graph processing, which makes them difficult to scale due to the evolutionary nature of their workload, irregular communication and load imbalance. EpiSimdemics is such an application simulating epidemic diffusion in extremely large and realistic social contact networks. It implements a graph-based system that captures dynamics among co-evolving entities. This paper presents an implementation of EpiSimdemics in Charm++ that enables future research by social, biological and computational scientists at unprecedented data and system scales. We present new methods for application-specific processing of graph data and demonstrate the effectiveness of these methods on a Cray XE6, specifically NCSA's Blue Waters system.
动态系统建模是一门重要的应用课程,涵盖了广泛的学科,包括但不限于生物、化学、金融、国家安全和卫生保健。这类应用程序通常涉及大规模、不规则的图形处理,由于其工作负载的演化性质、不规则的通信和负载不平衡,这使得它们难以扩展。episimdemic就是这样一个应用程序,它可以模拟流行病在超大的现实社会接触网络中的传播。它实现了一个基于图的系统,可以捕获共同进化实体之间的动态。本文介绍了episimdemic在Charm++中的实现,使社会、生物和计算科学家能够在前所未有的数据和系统规模上进行未来的研究。我们提出了针对特定应用程序处理图形数据的新方法,并在Cray XE6上演示了这些方法的有效性,特别是NCSA的Blue Waters系统。
{"title":"Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters","authors":"Jae-Seung Yeom, A. Bhatele, K. Bisset, Eric J. Bohm, Abhishek K. Gupta, L. Kalé, M. Marathe, Dimitrios S. Nikolopoulos, M. Schulz, Lukasz Wesolowski","doi":"10.1109/IPDPS.2014.83","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.83","url":null,"abstract":"Modeling dynamical systems represents an important application class covering a wide range of disciplines including but not limited to biology, chemistry, finance, national security, and health care. Such applications typically involve large-scale, irregular graph processing, which makes them difficult to scale due to the evolutionary nature of their workload, irregular communication and load imbalance. EpiSimdemics is such an application simulating epidemic diffusion in extremely large and realistic social contact networks. It implements a graph-based system that captures dynamics among co-evolving entities. This paper presents an implementation of EpiSimdemics in Charm++ that enables future research by social, biological and computational scientists at unprecedented data and system scales. We present new methods for application-specific processing of graph data and demonstrate the effectiveness of these methods on a Cray XE6, specifically NCSA's Blue Waters system.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127335185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Bursting the Cloud Data Bubble: Towards Transparent Storage Elasticity in IaaS Clouds 打破云数据泡沫:在IaaS云中实现透明存储弹性
Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.25
Bogdan Nicolae, Pierre Riteau, K. Keahey
Storage elasticity on IaaS clouds is an important feature for data-intensive workloads: storage requirements can vary greatly during application runtime, making worst-case over-provisioning a poor choice that leads to unnecessarily tied-up storage and extra costs for the user. While the ability to adapt dynamically to storage requirements is thus attractive, how to implement it is not well understood. Current approaches simply rely on users to attach and detach virtual disks to the virtual machine (VM) instances and then manage them manually, thus greatly increasing application complexity while reducing cost efficiency. Unlike such approaches, this paper aims to provide a transparent solution that presents a unified storage space to the VM in the form of a regular POSIX file system that hides the details of attaching and detaching virtual disks by handling those actions transparently based on dynamic application requirements. The main difficulty in this context is to understand the intent of the application and regulate the available storage in order to avoid running out of space while minimizing the performance overhead of doing so. To this end, we propose a storage space prediction scheme that analyzes multiple system parameters and dynamically adapts monitoring based on the intensity of the I/O in order to get as close as possible to the real usage. We show the value of our proposal over static worst-case over-provisioning and simpler elastic schemes that rely on a reactive model to attach and detach virtual disks, using both synthetic benchmarks and real-life data-intensive applications. Our experiments demonstrate that we can reduce storage waste/cost by 30-40% with only 2-5% performance overhead.
IaaS云上的存储弹性是数据密集型工作负载的一个重要特性:在应用程序运行期间,存储需求可能变化很大,这使得最坏情况下的过度配置成为一个糟糕的选择,它会导致不必要的存储捆绑和用户的额外成本。虽然动态适应存储需求的能力很有吸引力,但如何实现它还没有得到很好的理解。目前的方法仅仅依赖于用户将虚拟磁盘附加和分离到虚拟机实例,然后手动管理它们,从而大大增加了应用程序的复杂性,同时降低了成本效率。与这些方法不同,本文旨在提供一种透明的解决方案,以常规POSIX文件系统的形式向VM提供统一的存储空间,通过基于动态应用程序需求透明地处理这些操作,隐藏了附加和分离虚拟磁盘的细节。在这种情况下,主要的困难是理解应用程序的意图并调节可用存储,以避免耗尽空间,同时尽量减少这样做的性能开销。为此,我们提出了一种存储空间预测方案,该方案分析了多个系统参数,并根据I/O的强度动态调整监控,以尽可能接近实际使用情况。我们展示了我们的建议的价值,而不是静态的最坏情况过度配置和更简单的弹性方案,这些方案依赖于一个反应模型来附加和分离虚拟磁盘,使用合成基准测试和真实的数据密集型应用程序。我们的实验表明,我们可以减少30-40%的存储浪费/成本,而性能开销仅为2-5%。
{"title":"Bursting the Cloud Data Bubble: Towards Transparent Storage Elasticity in IaaS Clouds","authors":"Bogdan Nicolae, Pierre Riteau, K. Keahey","doi":"10.1109/IPDPS.2014.25","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.25","url":null,"abstract":"Storage elasticity on IaaS clouds is an important feature for data-intensive workloads: storage requirements can vary greatly during application runtime, making worst-case over-provisioning a poor choice that leads to unnecessarily tied-up storage and extra costs for the user. While the ability to adapt dynamically to storage requirements is thus attractive, how to implement it is not well understood. Current approaches simply rely on users to attach and detach virtual disks to the virtual machine (VM) instances and then manage them manually, thus greatly increasing application complexity while reducing cost efficiency. Unlike such approaches, this paper aims to provide a transparent solution that presents a unified storage space to the VM in the form of a regular POSIX file system that hides the details of attaching and detaching virtual disks by handling those actions transparently based on dynamic application requirements. The main difficulty in this context is to understand the intent of the application and regulate the available storage in order to avoid running out of space while minimizing the performance overhead of doing so. To this end, we propose a storage space prediction scheme that analyzes multiple system parameters and dynamically adapts monitoring based on the intensity of the I/O in order to get as close as possible to the real usage. We show the value of our proposal over static worst-case over-provisioning and simpler elastic schemes that rely on a reactive model to attach and detach virtual disks, using both synthetic benchmarks and real-life data-intensive applications. Our experiments demonstrate that we can reduce storage waste/cost by 30-40% with only 2-5% performance overhead.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126169073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Accelerating MPI Collective Communications through Hierarchical Algorithms Without Sacrificing Inter-Node Communication Flexibility 在不牺牲节点间通信灵活性的前提下,通过分层算法加速MPI集体通信
Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.32
Benjamin S. Parsons, Vijay S. Pai
This paper presents and evaluates a universal algorithm to improve the performance of MPI collective communication operations on hierarchical clusters with many-core nodes. This algorithm exploits shared-memory buffers for efficient intra-node communication while still allowing the use of unmodified, hierarchy-unaware traditional collectives for inter-node communication (including collectives like Alltoallv). This algorithm improves on past works that convert a specific collective algorithm into a hierarchical version and are generally restricted to fan-in, fan-out, and All gather algorithms. Experimental results show impressive performance improvements utilizing a variety of collectives from MPICH as well as the closed-source Cray MPT for the inter-node communication. The experimental evaluation tests the new algorithms with as many as 65536 cores and sees speedups over the baseline averaging 14.2x for Alltoallv, 26x for All gather, and 32.7x for Reduce-Scatter. The paper further improves inter-node communication by utilizing multiple senders from the same shared memory buffer, achieving additional speedups averaging 2.5x. The discussion also evaluates special-purpose extensions to improve intra-node communication by returning shared memory or copy-on-write protected buffers from the collective.
本文提出并评估了一种通用算法,以提高多核节点分层集群上MPI集体通信操作的性能。该算法利用共享内存缓冲区进行高效的节点内通信,同时仍然允许使用未修改的、不了解层次结构的传统集合进行节点间通信(包括像Alltoallv这样的集合)。该算法改进了过去的工作,将特定的集体算法转换为分层版本,并且通常仅限于扇入,扇出和所有收集算法。实验结果显示,利用来自MPICH的各种集合以及用于节点间通信的闭源Cray MPT,性能得到了令人印象深刻的改进。实验评估测试了多达65536个核心的新算法,发现Alltoallv的平均加速速度为14.2倍,All gather为26倍,Reduce-Scatter为32.7倍。通过利用来自同一共享内存缓冲区的多个发送者,本文进一步改进了节点间通信,实现了平均2.5倍的额外速度。讨论还评估了通过从集合中返回共享内存或写时复制保护缓冲区来改进节点内通信的特殊用途扩展。
{"title":"Accelerating MPI Collective Communications through Hierarchical Algorithms Without Sacrificing Inter-Node Communication Flexibility","authors":"Benjamin S. Parsons, Vijay S. Pai","doi":"10.1109/IPDPS.2014.32","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.32","url":null,"abstract":"This paper presents and evaluates a universal algorithm to improve the performance of MPI collective communication operations on hierarchical clusters with many-core nodes. This algorithm exploits shared-memory buffers for efficient intra-node communication while still allowing the use of unmodified, hierarchy-unaware traditional collectives for inter-node communication (including collectives like Alltoallv). This algorithm improves on past works that convert a specific collective algorithm into a hierarchical version and are generally restricted to fan-in, fan-out, and All gather algorithms. Experimental results show impressive performance improvements utilizing a variety of collectives from MPICH as well as the closed-source Cray MPT for the inter-node communication. The experimental evaluation tests the new algorithms with as many as 65536 cores and sees speedups over the baseline averaging 14.2x for Alltoallv, 26x for All gather, and 32.7x for Reduce-Scatter. The paper further improves inter-node communication by utilizing multiple senders from the same shared memory buffer, achieving additional speedups averaging 2.5x. The discussion also evaluates special-purpose extensions to improve intra-node communication by returning shared memory or copy-on-write protected buffers from the collective.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121670765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Improving the Performance of CA-GMRES on Multicores with Multiple GPUs 提高CA-GMRES在多核多gpu上的性能
Pub Date : 2014-05-19 DOI: 10.1109/IPDPS.2014.48
I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, J. Dongarra
The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming crucial component in computing, we investigate the effectiveness of these techniques on multicore CPUs with multiple GPUs. While we present the detailed performance studies of a matrix powers kernel on multiple GPUs, we particularly focus on orthogonalization strategies that have a great impact on both the numerical stability and performance of GMRES, especially as the matrix becomes sparser or ill-conditioned. We present the experimental results on two eight-core Intel Sandy Bridge CPUs with three NDIVIA Fermi GPUs and demonstrate that significant speedups can be obtained by avoiding communication, either on a GPU or between the GPUs. As part of our study, we investigate several optimization techniques for the GPU kernels that can also be used in other iterative solvers besides GMRES. Hence, our studies not only emphasize the importance of avoiding communication on GPUs, but they also provide insight about the effects of these optimization techniques on the performance of the sparse solvers, and may have greater impact beyond GMRES.
广义最小残差法(GMRES)是求解非对称线性方程组最常用的迭代方法之一。近年来,GMRES中避免通信的技术引起了人们的关注,因为与浮点运算相比,通信在现代计算机上变得越来越昂贵。由于图形处理单元(gpu)现在成为计算中的关键组件,我们研究了这些技术在具有多个gpu的多核cpu上的有效性。虽然我们在多个gpu上详细介绍了矩阵幂核的性能研究,但我们特别关注对GMRES的数值稳定性和性能有很大影响的正交化策略,特别是当矩阵变得稀疏或病态时。我们展示了两个八核英特尔Sandy Bridge cpu和三个NDIVIA Fermi GPU的实验结果,并证明通过避免GPU上或GPU之间的通信可以获得显着的速度提升。作为我们研究的一部分,我们研究了几种GPU内核的优化技术,这些技术也可以用于除GMRES之外的其他迭代求解器。因此,我们的研究不仅强调了在gpu上避免通信的重要性,而且还提供了这些优化技术对稀疏求解器性能的影响的见解,并且可能具有比GMRES更大的影响。
{"title":"Improving the Performance of CA-GMRES on Multicores with Multiple GPUs","authors":"I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, J. Dongarra","doi":"10.1109/IPDPS.2014.48","DOIUrl":"https://doi.org/10.1109/IPDPS.2014.48","url":null,"abstract":"The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming crucial component in computing, we investigate the effectiveness of these techniques on multicore CPUs with multiple GPUs. While we present the detailed performance studies of a matrix powers kernel on multiple GPUs, we particularly focus on orthogonalization strategies that have a great impact on both the numerical stability and performance of GMRES, especially as the matrix becomes sparser or ill-conditioned. We present the experimental results on two eight-core Intel Sandy Bridge CPUs with three NDIVIA Fermi GPUs and demonstrate that significant speedups can be obtained by avoiding communication, either on a GPU or between the GPUs. As part of our study, we investigate several optimization techniques for the GPU kernels that can also be used in other iterative solvers besides GMRES. Hence, our studies not only emphasize the importance of avoiding communication on GPUs, but they also provide insight about the effects of these optimization techniques on the performance of the sparse solvers, and may have greater impact beyond GMRES.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113975580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
期刊
2014 IEEE 28th International Parallel and Distributed Processing Symposium
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1