Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing最新文献

Deep Learning in Cancer and Infectious Disease: Novel Driver Problems for Future HPC Architecture 癌症和传染病中的深度学习:未来HPC架构的新驱动问题

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pub Date : 2017-06-26 DOI: 10.1145/3078597.3091526

Rick L. Stevens

The adoption of machine learning is proving to be an amazingly successful strategy in improving predictive models for cancer and infectious disease. In this talk I will discuss two projects my group is working on to advance biomedical research through the use of machine learning and HPC. In cancer, machine learning and in deep learning in particular, is used to advance our ability to diagnosis and classify tumors. Recently demonstrated automated systems are routinely out performing human expertise. Deep learning is also being used to predict patient response to cancer treatments and to screen for new anti-cancer compounds. In basic cancer research its being use to supervise large-scale multi-resolution molecular dynamics simulations used to explore cancer gene signaling pathways. In public health it's being used to interpret millions of medical records to identify optimal treatment strategies. In infectious disease research machine learning methods are being used to predict antibiotic resistance and to identify novel antibiotic resistance mechanisms that might be present. More generally machine learning is emerging as a general tool to augment and extend mechanistic models in biology and many other fields. It's becoming an important component of scientific workloads. From a computational architecture standpoint, deep neural network (DNN) based scientific applications have some unique requirements. They require high compute density to support matrix-matrix and matrix-vector operations, but they rarely require 64bit or even 32bits of precision, thus architects are creating new instructions and new design points to accelerate training. Most current DNNs rely on dense fully connected networks and convolutional networks and thus are reasonably matched to current HPC accelerators. However future DNNs may rely less on dense communication patterns. Like simulation codes, power efficient DNNs require high-bandwidth memory be physically close to arithmetic units to reduce costs of data motion and a high-bandwidth communication fabric between (perhaps modest scale) groups of processors to support network model parallelism. DNNs in general do not have good strong scaling behavior, so to fully exploit large-scale parallelism they rely on a combination of model, data and search parallelism. Deep learning problems also require large-quantities of training data to be made available or generated at each node, thus providing opportunities for NVRAM. Discovering optimal deep learning models often involves a large-scale search of hyperparameters. It's not uncommon to search a space of tens of thousands of model configurations. Naïve searches are outperformed by various intelligent searching strategies, including new approaches that use generative neural networks to manage the search space. HPC architectures that can support these large-scale intelligent search methods as well as efficient model training are needed.

事实证明，在改进癌症和传染病的预测模型方面，采用机器学习是一种非常成功的策略。在这次演讲中，我将讨论我的小组正在进行的两个项目，通过使用机器学习和高性能计算来推进生物医学研究。在癌症领域，机器学习，尤其是深度学习，被用来提高我们诊断和分类肿瘤的能力。最近展示的自动化系统通常会超越人类的专业知识。深度学习也被用于预测患者对癌症治疗的反应，以及筛选新的抗癌化合物。在基础癌症研究中，它被用于监督用于探索癌症基因信号通路的大规模多分辨率分子动力学模拟。在公共卫生领域，它被用来解读数以百万计的医疗记录，以确定最佳的治疗策略。在传染病研究中，机器学习方法被用来预测抗生素耐药性，并确定可能存在的新的抗生素耐药性机制。更普遍的是，机器学习正在成为增强和扩展生物学和许多其他领域的机械模型的通用工具。它正在成为科学工作量的重要组成部分。从计算架构的角度来看，基于深度神经网络(DNN)的科学应用有一些独特的要求。它们需要高计算密度来支持矩阵-矩阵和矩阵-向量运算，但它们很少需要64位甚至32位的精度，因此架构师正在创建新的指令和新的设计点来加速训练。目前大多数深度神经网络依赖于密集的全连接网络和卷积网络，因此与当前的高性能计算加速器相匹配。然而，未来的深度神经网络可能会减少对密集通信模式的依赖。与仿真代码一样，高能效dnn需要物理上接近算术单元的高带宽内存，以降低数据移动的成本，并需要处理器组之间(可能适度规模)的高带宽通信结构来支持网络模型并行性。dnn通常没有很好的强扩展行为，因此为了充分利用大规模并行性，它们依赖于模型、数据和搜索并行性的结合。深度学习问题还需要在每个节点上提供或生成大量的训练数据，从而为NVRAM提供了机会。发现最优深度学习模型通常需要对超参数进行大规模搜索。搜索包含数万个模型配置的空间并不罕见。Naïve搜索性能优于各种智能搜索策略，包括使用生成神经网络管理搜索空间的新方法。HPC架构需要能够支持这些大规模智能搜索方法以及高效的模型训练。

{"title":"Deep Learning in Cancer and Infectious Disease: Novel Driver Problems for Future HPC Architecture","authors":"Rick L. Stevens","doi":"10.1145/3078597.3091526","DOIUrl":"https://doi.org/10.1145/3078597.3091526","url":null,"abstract":"The adoption of machine learning is proving to be an amazingly successful strategy in improving predictive models for cancer and infectious disease. In this talk I will discuss two projects my group is working on to advance biomedical research through the use of machine learning and HPC. In cancer, machine learning and in deep learning in particular, is used to advance our ability to diagnosis and classify tumors. Recently demonstrated automated systems are routinely out performing human expertise. Deep learning is also being used to predict patient response to cancer treatments and to screen for new anti-cancer compounds. In basic cancer research its being use to supervise large-scale multi-resolution molecular dynamics simulations used to explore cancer gene signaling pathways. In public health it's being used to interpret millions of medical records to identify optimal treatment strategies. In infectious disease research machine learning methods are being used to predict antibiotic resistance and to identify novel antibiotic resistance mechanisms that might be present. More generally machine learning is emerging as a general tool to augment and extend mechanistic models in biology and many other fields. It's becoming an important component of scientific workloads. From a computational architecture standpoint, deep neural network (DNN) based scientific applications have some unique requirements. They require high compute density to support matrix-matrix and matrix-vector operations, but they rarely require 64bit or even 32bits of precision, thus architects are creating new instructions and new design points to accelerate training. Most current DNNs rely on dense fully connected networks and convolutional networks and thus are reasonably matched to current HPC accelerators. However future DNNs may rely less on dense communication patterns. Like simulation codes, power efficient DNNs require high-bandwidth memory be physically close to arithmetic units to reduce costs of data motion and a high-bandwidth communication fabric between (perhaps modest scale) groups of processors to support network model parallelism. DNNs in general do not have good strong scaling behavior, so to fully exploit large-scale parallelism they rely on a combination of model, data and search parallelism. Deep learning problems also require large-quantities of training data to be made available or generated at each node, thus providing opportunities for NVRAM. Discovering optimal deep learning models often involves a large-scale search of hyperparameters. It's not uncommon to search a space of tens of thousands of model configurations. Naïve searches are outperformed by various intelligent searching strategies, including new approaches that use generative neural networks to manage the search space. HPC architectures that can support these large-scale intelligent search methods as well as efficient model training are needed.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114437370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

COS: A Parallel Performance Model for Dynamic Variations in Processor Speed, Memory Speed, and Thread Concurrency COS:处理器速度、内存速度和线程并发性动态变化的并行性能模型

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078601

Bo Li, E. León, K. Cameron

Highly-parallel, high-performance scientific applications must maximize performance inside of a power envelope while maintaining scalability. Emergent parallel and distributed systems offer a growing number of operating modes that provide unprecedented control of processor speed, memory latency, and memory bandwidth. Optimizing these systems for performance and power requires an understanding of the combined effects of these modes and thread concurrency on execution time. In this paper, we describe how an analytical performance model that separates pure computation time (C) and pure stall time (S) from computation-memory overlap time (O) can accurately capture these combined effects. We apply the COS model to predict the performance of thread and power mode combinations to within 7% and 17% for parallel applications (e.g. LULESH) on Intel x86 and IBM BG/Q architectures, respectively. The key insight of the COS model is that the combined effects of processor and memory throttling and concurrency on overlap trend differently than the combined effects on pure computation and pure stall time. The COS model is novel in that it enables independent approximation of overlap which leads to capabilities and accuracies that are as good or better than the best available approaches.

高度并行、高性能的科学应用必须在保持可扩展性的同时，在功率包络内最大限度地提高性能。新兴的并行和分布式系统提供了越来越多的操作模式，这些模式提供了前所未有的处理器速度、内存延迟和内存带宽控制。优化这些系统的性能和功率需要了解这些模式和线程并发性对执行时间的综合影响。在本文中，我们描述了将纯计算时间(C)和纯失速时间(S)与计算-内存重叠时间(O)分开的分析性能模型如何准确地捕获这些组合效应。我们应用COS模型分别预测了英特尔x86和IBM BG/Q架构上并行应用程序(例如LULESH)的线程和功率模式组合的性能在7%和17%以内。COS模型的关键观点是处理器和内存节流和并发对重叠趋势的综合影响不同于对纯计算和纯失速时间的综合影响。COS模型的新颖之处在于，它允许对重叠进行独立逼近，从而导致与最佳可用方法一样好或更好的能力和精度。

{"title":"COS: A Parallel Performance Model for Dynamic Variations in Processor Speed, Memory Speed, and Thread Concurrency","authors":"Bo Li, E. León, K. Cameron","doi":"10.1145/3078597.3078601","DOIUrl":"https://doi.org/10.1145/3078597.3078601","url":null,"abstract":"Highly-parallel, high-performance scientific applications must maximize performance inside of a power envelope while maintaining scalability. Emergent parallel and distributed systems offer a growing number of operating modes that provide unprecedented control of processor speed, memory latency, and memory bandwidth. Optimizing these systems for performance and power requires an understanding of the combined effects of these modes and thread concurrency on execution time. In this paper, we describe how an analytical performance model that separates pure computation time (C) and pure stall time (S) from computation-memory overlap time (O) can accurately capture these combined effects. We apply the COS model to predict the performance of thread and power mode combinations to within 7% and 17% for parallel applications (e.g. LULESH) on Intel x86 and IBM BG/Q architectures, respectively. The key insight of the COS model is that the combined effects of processor and memory throttling and concurrency on overlap trend differently than the combined effects on pure computation and pure stall time. The COS model is novel in that it enables independent approximation of overlap which leads to capabilities and accuracies that are as good or better than the best available approaches.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126236200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Better Safe than Sorry: Grappling with Failures of In-Memory Data Analytics Frameworks 安全总比后悔好:应对内存数据分析框架的失败

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078600

Bogdan Ghit, D. Epema

Providing fault-tolerance is of major importance for data analytics frameworks such as Hadoop and Spark, which are typically deployed in large clusters that are known to experience high failures rates. Unexpected events such as compute node failures are in particular an important challenge for in-memory data analytics frameworks, as the widely adopted approach to deal with them is to recompute work already done. Recomputing lost work, however, requires allocation of extra resource to re-execute tasks, thus increasing the job runtimes. To address this problem, we design a checkpointing system called Panda that is tailored to the intrinsic characteristics of data analytics frameworks. In particular, Panda employs fine-grained checkpointing at the level of task outputs and dynamically identifies tasks that are worthwhile to be checkpointed rather than be recomputed. As has been abundantly shown, tasks of data analytics jobs may have very variable runtimes and output sizes. These properties form the basis of three checkpointing policies which we incorporate into Panda. We first empirically evaluate Panda on a multicluster system with single data analytics applications under space-correlated failures, and find that Panda is close to the performance of a fail-free execution in unmodified Spark for a large range of concurrent failures. Then we perform simulations of complete workloads, mimicking the size and operation of a Google cluster, and show that Panda provides significant improvements in the average job runtime for wide ranges of the failure rate and system load.

提供容错功能对于Hadoop和Spark等数据分析框架非常重要，因为它们通常部署在故障率很高的大型集群中。对于内存数据分析框架来说，诸如计算节点故障之类的意外事件是一个特别重要的挑战，因为处理它们的广泛采用的方法是重新计算已经完成的工作。但是，重新计算丢失的工作需要分配额外的资源来重新执行任务，从而增加了作业运行时间。为了解决这个问题，我们设计了一个名为Panda的检查点系统，该系统是根据数据分析框架的内在特征量身定制的。特别是，Panda在任务输出级别使用细粒度检查点，并动态识别值得检查点而不是重新计算的任务。正如已经充分显示的那样，数据分析作业的任务可能具有非常可变的运行时和输出大小。这些属性构成了我们合并到Panda中的三个检查点策略的基础。我们首先在一个多集群系统上对Panda进行了经验评估，该系统在空间相关故障下具有单个数据分析应用程序，并发现对于大范围的并发故障，Panda接近于在未修改的Spark中无故障执行的性能。然后，我们对完整的工作负载进行模拟，模拟谷歌集群的大小和操作，并显示Panda在广泛的故障率和系统负载范围内提供了显著的平均作业运行时改进。

{"title":"Better Safe than Sorry: Grappling with Failures of In-Memory Data Analytics Frameworks","authors":"Bogdan Ghit, D. Epema","doi":"10.1145/3078597.3078600","DOIUrl":"https://doi.org/10.1145/3078597.3078600","url":null,"abstract":"Providing fault-tolerance is of major importance for data analytics frameworks such as Hadoop and Spark, which are typically deployed in large clusters that are known to experience high failures rates. Unexpected events such as compute node failures are in particular an important challenge for in-memory data analytics frameworks, as the widely adopted approach to deal with them is to recompute work already done. Recomputing lost work, however, requires allocation of extra resource to re-execute tasks, thus increasing the job runtimes. To address this problem, we design a checkpointing system called Panda that is tailored to the intrinsic characteristics of data analytics frameworks. In particular, Panda employs fine-grained checkpointing at the level of task outputs and dynamically identifies tasks that are worthwhile to be checkpointed rather than be recomputed. As has been abundantly shown, tasks of data analytics jobs may have very variable runtimes and output sizes. These properties form the basis of three checkpointing policies which we incorporate into Panda. We first empirically evaluate Panda on a multicluster system with single data analytics applications under space-correlated failures, and find that Panda is close to the performance of a fail-free execution in unmodified Spark for a large range of concurrent failures. Then we perform simulations of complete workloads, mimicking the size and operation of a Google cluster, and show that Panda provides significant improvements in the average job runtime for wide ranges of the failure rate and system load.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125068533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases 分布式图数据库的增量在线图划分算法

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078606

Dong Dai, Wei Zhang, Yong Chen

Graphs have become increasingly important in many applications and domains such as querying relationships in social networks or managing rich metadata generated in scientific computing. Many of these use cases require high-performance distributed graph databases for serving continuous updates from clients and, at the same time, answering complex queries regarding the current graph. These operations in graph databases, also referred to as online transaction processing (OLTP) operations, have specific design and implementation requirements for graph partitioning algorithms. In this research, we argue it is necessary to consider the connectivity and the vertex degree changes during graph partitioning. Based on this idea, we designed an Incremental Online Graph Partitioning (IOGP) algorithm that responds accordingly to the incremental changes of vertex degree. IOGP helps achieve better locality, generate balanced partitions, and increase the parallelism for accessing high-degree vertices of the graph. Over both real-world and synthetic graphs, IOGP demonstrates as much as 2x better query performance with a less than 10% overhead when compared against state-of-the-art graph partitioning algorithms.

图在许多应用和领域中变得越来越重要，例如查询社交网络中的关系或管理科学计算中生成的丰富元数据。许多这样的用例都需要高性能的分布式图数据库来服务于来自客户端的持续更新，同时，回答关于当前图的复杂查询。图数据库中的这些操作，也称为联机事务处理(OLTP)操作，对图分区算法有特定的设计和实现要求。在本研究中，我们认为在图划分过程中有必要考虑连通性和顶点度的变化。基于这一思想，我们设计了一种增量在线图划分(IOGP)算法，该算法对顶点度的增量变化做出相应的响应。IOGP有助于实现更好的局部性，生成平衡的分区，并增加访问图的高度顶点的并行性。在真实图和合成图上，与最先进的图分区算法相比，IOGP的查询性能提高了2倍，开销不到10%。

{"title":"IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases","authors":"Dong Dai, Wei Zhang, Yong Chen","doi":"10.1145/3078597.3078606","DOIUrl":"https://doi.org/10.1145/3078597.3078606","url":null,"abstract":"Graphs have become increasingly important in many applications and domains such as querying relationships in social networks or managing rich metadata generated in scientific computing. Many of these use cases require high-performance distributed graph databases for serving continuous updates from clients and, at the same time, answering complex queries regarding the current graph. These operations in graph databases, also referred to as online transaction processing (OLTP) operations, have specific design and implementation requirements for graph partitioning algorithms. In this research, we argue it is necessary to consider the connectivity and the vertex degree changes during graph partitioning. Based on this idea, we designed an Incremental Online Graph Partitioning (IOGP) algorithm that responds accordingly to the incremental changes of vertex degree. IOGP helps achieve better locality, generate balanced partitions, and increase the parallelism for accessing high-degree vertices of the graph. Over both real-world and synthetic graphs, IOGP demonstrates as much as 2x better query performance with a less than 10% overhead when compared against state-of-the-art graph partitioning algorithms.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"156 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122615187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Caches All the Way Down: Infrastructure for Data Intensive Science 高速缓存:数据密集型科学的基础设施

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pub Date : 2017-06-26 DOI: 10.1145/3078597.3091525

D. Abramson

The rise of big data science has created new demands for modern computer systems. While floating performance has driven computer architecture and system design for the past few decades, there is renewed interest in the speed at which data can be ingested and processed. Early exemplars such as Gordon, the NSF funded system at the San Diego Supercomputing Centre, shifted the focus from pure floating-point performance to memory and IO rates. At the University of Queensland we have continued this trend with the design of FlashLite, a parallel cluster equipped with large amounts of main memory, flash disk, and a distributed shared memory system (ScaleMP's vSMP). This allows applications to place data "close" to the processor, enhancing processing speeds. Further, we have built a geographically distributed multi-tier hierarchical data fabric called MeDiCI, which provides an abstraction of very large data stores across the metropolitan area. MeDiCI leverages industry solutions such as IBM's Spectrum Scale and SGI's DMF platforms. Caching underpins both FlashLite and MeDiCI. In this I will describe the design decisions and illustrate some early application studies that benefit from the approach. I will also highlight some of the challenges that need to be solved for this approach to become mainstream.

大数据科学的兴起对现代计算机系统提出了新的需求。虽然在过去的几十年里，浮动性能一直是计算机架构和系统设计的驱动力，但人们对数据摄取和处理的速度又产生了新的兴趣。早期的例子，如Gordon，由美国国家科学基金会资助的圣地亚哥超级计算中心的系统，将重点从纯粹的浮点性能转移到内存和IO速率上。在昆士兰大学，我们通过FlashLite的设计延续了这一趋势，FlashLite是一个并行集群，配备了大量的主内存、闪存盘和分布式共享内存系统(ScaleMP的vSMP)。这允许应用程序将数据“靠近”处理器，从而提高处理速度。此外，我们已经建立了一个地理上分布的多层分层数据结构，称为MeDiCI，它提供了跨大都市区域的非常大的数据存储的抽象。MeDiCI利用行业解决方案，如IBM的Spectrum Scale和SGI的DMF平台。缓存支持FlashLite和MeDiCI。在本文中，我将描述设计决策，并举例说明一些受益于该方法的早期应用研究。我还将强调要使这种方法成为主流需要解决的一些挑战。

{"title":"Caches All the Way Down: Infrastructure for Data Intensive Science","authors":"D. Abramson","doi":"10.1145/3078597.3091525","DOIUrl":"https://doi.org/10.1145/3078597.3091525","url":null,"abstract":"The rise of big data science has created new demands for modern computer systems. While floating performance has driven computer architecture and system design for the past few decades, there is renewed interest in the speed at which data can be ingested and processed. Early exemplars such as Gordon, the NSF funded system at the San Diego Supercomputing Centre, shifted the focus from pure floating-point performance to memory and IO rates. At the University of Queensland we have continued this trend with the design of FlashLite, a parallel cluster equipped with large amounts of main memory, flash disk, and a distributed shared memory system (ScaleMP's vSMP). This allows applications to place data \"close\" to the processor, enhancing processing speeds. Further, we have built a geographically distributed multi-tier hierarchical data fabric called MeDiCI, which provides an abstraction of very large data stores across the metropolitan area. MeDiCI leverages industry solutions such as IBM's Spectrum Scale and SGI's DMF platforms. Caching underpins both FlashLite and MeDiCI. In this I will describe the design decisions and illustrate some early application studies that benefit from the approach. I will also highlight some of the challenges that need to be solved for this approach to become mainstream.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129824775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Machine and Application Aware Partitioning for Adaptive Mesh Refinement Applications 自适应网格细化应用的机器和应用感知分区

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078610

Milinda Fernando, Dmitry Duplyakin, H. Sundar

Load balancing and partitioning are critical when it comes to parallel computations. Popular partitioning strategies based on space filling curves focus on equally dividing work. The partitions produced are independent of the architecture or the application. Given the ever-increasing relative cost of data movement and increasing heterogeneity of our architectures, it is no longer sufficient to only consider an equal partitioning of work. Minimizing communication costs are equally if not more important. Our hypothesis is that an unequal partitioning that minimizes communication costs significantly can scale and perform better than conventional equal-work partitioning schemes. This tradeoff is dependent on the architecture as well as the application. We validate our hypothesis in the context of a finite-element computation utilizing adaptive mesh-refinement. Our central contribution is a new partitioning scheme that minimizes the overall runtime of subsequent computations by performing architecture and application-aware non-uniform work assignment in order to decrease time to solution, primarily by minimizing data-movement. We evaluate our algorithm by comparing it against standard space-filling curve based partitioning algorithms and observing time-to-solution as well as energy-to-solution for solving Finite Element computations on adaptively refined meshes. We demonstrate excellent scalability of our new partition algorithm up to $262,144$ cores on ORNL's Titan and demonstrate that the proposed partitioning scheme reduces overall energy as well as time-to-solution for application codes by up to 22.0%

当涉及到并行计算时，负载平衡和分区至关重要。流行的基于空间填充曲线的划分策略侧重于平均划分工作。生成的分区独立于体系结构或应用程序。考虑到数据移动的相对成本不断增加，以及我们的体系结构的异构性不断增加，仅仅考虑相等的工作划分已经不够了。最小化沟通成本即使不是更重要，也是同等重要的。我们的假设是，显著最小化通信成本的不相等分区可以扩展，并且比传统的等功分区方案性能更好。这种权衡取决于体系结构和应用程序。我们在利用自适应网格细化的有限元计算的背景下验证了我们的假设。我们的主要贡献是一个新的分区方案，它通过执行体系结构和应用程序感知的非统一工作分配来最小化后续计算的总体运行时间，从而减少解决方案的时间，主要是通过最小化数据移动。我们通过将其与标准的基于空间填充曲线的划分算法进行比较来评估我们的算法，并观察在自适应精细网格上求解有限元计算的时间到解以及能量到解。我们在ORNL的Titan上展示了我们的新分区算法的出色可扩展性，最高可达262,144美元核，并证明了所提出的分区方案将应用程序代码的总能量和解决时间降低了22.0%

{"title":"Machine and Application Aware Partitioning for Adaptive Mesh Refinement Applications","authors":"Milinda Fernando, Dmitry Duplyakin, H. Sundar","doi":"10.1145/3078597.3078610","DOIUrl":"https://doi.org/10.1145/3078597.3078610","url":null,"abstract":"Load balancing and partitioning are critical when it comes to parallel computations. Popular partitioning strategies based on space filling curves focus on equally dividing work. The partitions produced are independent of the architecture or the application. Given the ever-increasing relative cost of data movement and increasing heterogeneity of our architectures, it is no longer sufficient to only consider an equal partitioning of work. Minimizing communication costs are equally if not more important. Our hypothesis is that an unequal partitioning that minimizes communication costs significantly can scale and perform better than conventional equal-work partitioning schemes. This tradeoff is dependent on the architecture as well as the application. We validate our hypothesis in the context of a finite-element computation utilizing adaptive mesh-refinement. Our central contribution is a new partitioning scheme that minimizes the overall runtime of subsequent computations by performing architecture and application-aware non-uniform work assignment in order to decrease time to solution, primarily by minimizing data-movement. We evaluate our algorithm by comparing it against standard space-filling curve based partitioning algorithms and observing time-to-solution as well as energy-to-solution for solving Finite Element computations on adaptively refined meshes. We demonstrate excellent scalability of our new partition algorithm up to $262,144$ cores on ORNL's Titan and demonstrate that the proposed partitioning scheme reduces overall energy as well as time-to-solution for application codes by up to 22.0%","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130286302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Building Secure Platforms for Research on Human Subjects: The Importance of Computer Scientists 构建安全的人类研究平台:计算机科学家的重要性

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078618

J. Lane

Businesses and government are using new approaches to decision-making. They are exploiting new streams of (mostly) digital personal data, such as daily transaction records, web-browsing data, cell phone location data, and social media activity; and they are applying new analytical models and tools. Social science researchers, who are not trained in the stewardship of these new kinds of data, must now collect, manage and use them appropriately. There are many technical challenges: disparate datasets must be ingested, their provenance determined and metadata documented. Researchers must be able to query datasets to know what data are available and how they can be used. Datasets must be joined in a scientific manner, which means that workflows need to be traced and managed in such a way that the research can be replicated(Lane, 2017). Computer scientists' expertise is of critical value in many of these areas, but of greatest interest to this group is the facilities in which data on human subjects are stored. The data must be securely housed, and privacy and confidentiality must be protected using the best approaches available. The access and use must be documented to meet the needs of data providers. Yet the technology currently used to provide access to sensitive data is largely artisanal and manual. The stewardship restrictions placed on the use of confidential administrative data prevent the use of best practices for research data management. As a result, links between data sources are rarely validated, results often are not replicated, and connected datasets, results, and methods are not accessible to subsequent researchers in the same field. This is where computer scientists' expertise can come to play in building approaches that will enable sensitive data from different sources to be discovered, integrated, and analyzed in a carefully controlled manner, and that will, furthermore, allow researchers to share analysis methods, results, and expertise in ways not easily possible today

企业和政府正在使用新的决策方法。他们正在利用新的(大部分)数字个人数据流，如日常交易记录、网页浏览数据、手机位置数据和社交媒体活动;他们正在应用新的分析模型和工具。没有受过管理这些新数据培训的社会科学研究人员现在必须适当地收集、管理和使用这些数据。有许多技术挑战:必须摄取不同的数据集，确定它们的来源并记录元数据。研究人员必须能够查询数据集，以了解哪些数据是可用的，以及如何使用它们。数据集必须以科学的方式连接，这意味着需要以一种可以复制研究的方式跟踪和管理工作流(Lane, 2017)。计算机科学家的专业知识在许多这些领域都具有关键价值，但这一群体最感兴趣的是存储人类受试者数据的设施。数据必须安全存放，必须使用可用的最佳方法保护隐私和机密性。访问和使用必须记录，以满足数据提供者的需要。然而，目前用于访问敏感数据的技术主要是手工和手动的。对使用机密行政数据的管理限制阻碍了对研究数据管理最佳做法的使用。因此，数据源之间的联系很少得到验证，结果通常无法复制，并且同一领域的后续研究人员无法访问连接的数据集、结果和方法。这就是计算机科学家的专业知识可以发挥作用的地方，他们可以建立方法，使来自不同来源的敏感数据能够以一种谨慎控制的方式被发现、集成和分析，而且，这将允许研究人员以今天不容易实现的方式共享分析方法、结果和专业知识

{"title":"Building Secure Platforms for Research on Human Subjects: The Importance of Computer Scientists","authors":"J. Lane","doi":"10.1145/3078597.3078618","DOIUrl":"https://doi.org/10.1145/3078597.3078618","url":null,"abstract":"Businesses and government are using new approaches to decision-making. They are exploiting new streams of (mostly) digital personal data, such as daily transaction records, web-browsing data, cell phone location data, and social media activity; and they are applying new analytical models and tools. Social science researchers, who are not trained in the stewardship of these new kinds of data, must now collect, manage and use them appropriately. There are many technical challenges: disparate datasets must be ingested, their provenance determined and metadata documented. Researchers must be able to query datasets to know what data are available and how they can be used. Datasets must be joined in a scientific manner, which means that workflows need to be traced and managed in such a way that the research can be replicated(Lane, 2017). Computer scientists' expertise is of critical value in many of these areas, but of greatest interest to this group is the facilities in which data on human subjects are stored. The data must be securely housed, and privacy and confidentiality must be protected using the best approaches available. The access and use must be documented to meet the needs of data providers. Yet the technology currently used to provide access to sensitive data is largely artisanal and manual. The stewardship restrictions placed on the use of confidential administrative data prevent the use of best practices for research data management. As a result, links between data sources are rarely validated, results often are not replicated, and connected datasets, results, and methods are not accessible to subsequent researchers in the same field. This is where computer scientists' expertise can come to play in building approaches that will enable sensitive data from different sources to be discovered, integrated, and analyzed in a carefully controlled manner, and that will, furthermore, allow researchers to share analysis methods, results, and expertise in ways not easily possible today","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129338091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Explaining Wide Area Data Transfer Performance 解释广域数据传输性能

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078605

Zhengchun Liu, Prasanna Balaprakash, R. Kettimuthu, Ian T Foster

Disk-to-disk wide-area file transfers involve many subsystems and tunable application parameters that pose significant challenges for bottleneck detection, system optimization, and performance prediction. Performance models can be used to address these challenges but have not proved generally usable because of a need for extensive online experiments to characterize subsystems. We show here how to overcome the need for such experiments by applying machine learning methods to historical data to estimate parameters for predictive models. Starting with log data for millions of Globus transfers involving billions of files and hundreds of petabytes, we engineer features for endpoint CPU load, network interface card load, and transfer characteristics; and we use these features in both linear and nonlinear models of transfer performance, We show that the resulting models have high explanatory power. For a representative set of 30,653 transfers over 30 heavily used source-destination pairs ("edges''),totaling 2,053 TB in 46.6 million files, we obtain median absolute percentage prediction errors (MdAPE) of 7.0% and 4.6% when using distinct linear and nonlinear models per edge, respectively; when using a single nonlinear model for all edges, we obtain an MdAPE of 7.8%. Our work broadens understanding of factors that influence file transfer rate by clarifying relationships between achieved transfer rates, transfer characteristics, and competing load. Our predictions can be used for distributed workflow scheduling and optimization, and our features can also be used for optimization and explanation.

磁盘到磁盘的广域文件传输涉及许多子系统和可调应用程序参数，这对瓶颈检测、系统优化和性能预测构成了重大挑战。性能模型可用于解决这些挑战，但由于需要大量的在线实验来描述子系统的特征，因此尚未证明通常可用。我们在这里展示了如何通过将机器学习方法应用于历史数据来估计预测模型的参数来克服对此类实验的需求。从涉及数十亿文件和数百pb的数百万Globus传输的日志数据开始，我们为端点CPU负载、网络接口卡负载和传输特性设计功能;并将这些特征应用于迁移绩效的线性和非线性模型中，结果表明所得模型具有较高的解释力。对于30,653个传输的代表性集合，超过30个频繁使用的源-目的地对(“边”)，在4660万个文件中总计2,053 TB，我们分别在每个边使用不同的线性和非线性模型时获得中位数绝对百分比预测误差(MdAPE)为7.0%和4.6%;当对所有边使用单一非线性模型时，我们获得了7.8%的MdAPE。我们的工作通过澄清已实现的传输速率、传输特性和竞争负载之间的关系，拓宽了对影响文件传输速率的因素的理解。我们的预测可用于分布式工作流调度和优化，我们的特性也可用于优化和解释。

{"title":"Explaining Wide Area Data Transfer Performance","authors":"Zhengchun Liu, Prasanna Balaprakash, R. Kettimuthu, Ian T Foster","doi":"10.1145/3078597.3078605","DOIUrl":"https://doi.org/10.1145/3078597.3078605","url":null,"abstract":"Disk-to-disk wide-area file transfers involve many subsystems and tunable application parameters that pose significant challenges for bottleneck detection, system optimization, and performance prediction. Performance models can be used to address these challenges but have not proved generally usable because of a need for extensive online experiments to characterize subsystems. We show here how to overcome the need for such experiments by applying machine learning methods to historical data to estimate parameters for predictive models. Starting with log data for millions of Globus transfers involving billions of files and hundreds of petabytes, we engineer features for endpoint CPU load, network interface card load, and transfer characteristics; and we use these features in both linear and nonlinear models of transfer performance, We show that the resulting models have high explanatory power. For a representative set of 30,653 transfers over 30 heavily used source-destination pairs (\"edges''),totaling 2,053 TB in 46.6 million files, we obtain median absolute percentage prediction errors (MdAPE) of 7.0% and 4.6% when using distinct linear and nonlinear models per edge, respectively; when using a single nonlinear model for all edges, we obtain an MdAPE of 7.8%. Our work broadens understanding of factors that influence file transfer rate by clarifying relationships between achieved transfer rates, transfer characteristics, and competing load. Our predictions can be used for distributed workflow scheduling and optimization, and our features can also be used for optimization and explanation.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115237086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 39

LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures LetGo:用于故障情况下HPC应用的轻量级连续框架

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078609

Bo Fang, Qiang Guan, Nathan Debardeleben, K. Pattabiraman, M. Ripeanu

Requirements for reliability, low power consumption, and performance place complex and conflicting demands on the design of high-performance computing (HPC) systems. Fault-tolerance techniques such as checkpoint/restart (C/R) protect HPC applications against hardware faults. These techniques, however, have non negligible overheads particularly when the fault rate exposed by the hardware is high: it is estimated that in future HPC systems, up to 60% of the computational cycles/power will be used for fault tolerance. To mitigate the overall overhead of fault-tolerance techniques, we propose LetGo, an approach that attempts to continue the execution of a HPC application when crashes would otherwise occur. Our hypothesis is that a class of HPC applications have good enough intrinsic fault tolerance so that its possible to re-purpose the default mechanism that terminates an application once a crash-causing error is signalled, and instead attempt to repair the corrupted application state, and continue the application execution. This paper explores this hypothesis, and quantifies the impact of using this observation in the context of checkpoint/restart (C/R) mechanisms. Our fault-injection experiments using a suite of five HPC applications show that, on average, LetGo is able to elide 62% of the crashes encountered by applications, of which 80% result in correct output, while incurring a negligible performance overhead. As a result, when LetGo is used in conjunction with a C/R scheme, it enables significantly higher efficiency thereby leading to faster time to solution.

高性能计算(HPC)系统的设计对可靠性、低功耗和性能提出了复杂且相互矛盾的要求。诸如检查点/重启(C/R)之类的容错技术可以保护HPC应用程序免受硬件故障的影响。然而，这些技术有不可忽略的开销，特别是当硬件暴露的故障率很高时:据估计，在未来的HPC系统中，高达60%的计算周期/功率将用于容错。为了减轻容错技术的总体开销，我们提出了LetGo，这是一种尝试在崩溃发生时继续执行HPC应用程序的方法。我们的假设是，一类HPC应用程序具有足够好的内在容错性，因此可以重新使用默认机制，一旦发出导致崩溃的错误信号就终止应用程序，而不是尝试修复损坏的应用程序状态，并继续执行应用程序。本文探讨了这一假设，并量化了在检查点/重新启动(C/R)机制的背景下使用这一观察结果的影响。我们使用一组5个HPC应用程序进行的故障注入实验表明，平均而言，LetGo能够消除应用程序遇到的62%的崩溃，其中80%的崩溃会产生正确的输出，同时产生微不足道的性能开销。因此，当LetGo与C/R方案结合使用时，它可以显着提高效率，从而加快解决方案的时间。

{"title":"LetGo: A Lightweight Continuous Framework for HPC Applications Under Failures","authors":"Bo Fang, Qiang Guan, Nathan Debardeleben, K. Pattabiraman, M. Ripeanu","doi":"10.1145/3078597.3078609","DOIUrl":"https://doi.org/10.1145/3078597.3078609","url":null,"abstract":"Requirements for reliability, low power consumption, and performance place complex and conflicting demands on the design of high-performance computing (HPC) systems. Fault-tolerance techniques such as checkpoint/restart (C/R) protect HPC applications against hardware faults. These techniques, however, have non negligible overheads particularly when the fault rate exposed by the hardware is high: it is estimated that in future HPC systems, up to 60% of the computational cycles/power will be used for fault tolerance. To mitigate the overall overhead of fault-tolerance techniques, we propose LetGo, an approach that attempts to continue the execution of a HPC application when crashes would otherwise occur. Our hypothesis is that a class of HPC applications have good enough intrinsic fault tolerance so that its possible to re-purpose the default mechanism that terminates an application once a crash-causing error is signalled, and instead attempt to repair the corrupted application state, and continue the application execution. This paper explores this hypothesis, and quantifies the impact of using this observation in the context of checkpoint/restart (C/R) mechanisms. Our fault-injection experiments using a suite of five HPC applications show that, on average, LetGo is able to elide 62% of the crashes encountered by applications, of which 80% result in correct output, while incurring a negligible performance overhead. As a result, when LetGo is used in conjunction with a C/R scheme, it enables significantly higher efficiency thereby leading to faster time to solution.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114931739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

MaDaTS: Managing Data on Tiered Storage for Scientific Workflows MaDaTS:管理科学工作流的分层存储数据

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing

Pub Date : 2017-06-26 DOI: 10.1145/3078597.3078611

D. Ghoshal, L. Ramakrishnan

Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the tiered storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.

科学工作流程越来越多地用于高性能计算(HPC)环境，以管理复杂的模拟和分析，通常消耗和生成大量数据。然而，工作流工具对管理输入、输出和中间数据的支持有限。工作流的数据元素通常由用户通过脚本或其他特别机制进行管理。未来HPC系统的技术进步是通过引入额外的层来提高数据密集型应用程序的I/O性能，从而重新定义内存和存储子系统。这些体系结构的变化为科学工作流的数据管理带来了额外的复杂性。因此，我们需要在高性能计算机上跨分层存储系统管理科学工作流数据。在本文中，我们介绍了MaDaTS(管理科学工作流的分层存储数据)的设计和实现，MaDaTS是一种管理科学工作流数据的软件架构。我们介绍了虚拟数据空间(VDS)，这是工作流中数据的抽象，它隐藏了底层存储系统的复杂性，同时允许用户控制数据管理策略。我们用真实的科学和综合工作流评估了数据管理策略，并展示了madat的能力。我们的实验表明，与传统的科学工作流程数据管理方法相比，madat具有灵活性、性能和可扩展性方面的优势。

{"title":"MaDaTS: Managing Data on Tiered Storage for Scientific Workflows","authors":"D. Ghoshal, L. Ramakrishnan","doi":"10.1145/3078597.3078611","DOIUrl":"https://doi.org/10.1145/3078597.3078611","url":null,"abstract":"Scientific workflows are increasingly used in High Performance Computing (HPC) environments to manage complex simulation and analyses, often consuming and generating large amounts of data. However, workflow tools have limited support for managing the input, output and intermediate data. The data elements of a workflow are often managed by the user through scripts or other ad-hoc mechanisms. Technology advances for future HPC systems is redefining the memory and storage subsystem by introducing additional tiers to improve the I/O performance of data-intensive applications. These architectural changes introduce additional complexities to managing data for scientific workflows. Thus, we need to manage the scientific workflow data across the tiered storage system on HPC machines. In this paper, we present the design and implementation of MaDaTS (Managing Data on Tiered Storage for Scientific Workflows), a software architecture that manages data for scientific workflows. We introduce Virtual Data Space (VDS), an abstraction of the data in a workflow that hides the complexities of the underlying storage system while allowing users to control data management strategies. We evaluate the data management strategies with real scientific and synthetic workflows, and demonstrate the capabilities of MaDaTS. Our experiments demonstrate the flexibility, performance and scalability gains of MaDaTS as compared to the traditional approach of managing data in scientific workflows.","PeriodicalId":436194,"journal":{"name":"Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing","volume":"31 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131519721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20