2012 IEEE 8th International Conference on E-Science最新文献

英文中文

Using Promethee methods for multi-criteria pull-based scheduling on DCIs 基于Promethee方法的数据中心多准则拉调度

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404483

Scheduling tasks in distributed computing infrastructures (DCIs) is challenging mainly because the scheduler is facing a number of more or less dependent parameters that characterize the hosts coming from a particular computing environment and the tasks. In this paper we introduce a multi-criteria scheduling method for DCIs, aiming a better matching between hosts, and tasks waiting in a priority queue at a pull-based scheduler. The novelty of the approach consists in employing the Promethee [1] decision aid for selecting tasks. In the aim of computing preference relationships (priorities) among tasks, this approach performs pairwise comparisons of values that characterize tasks. The method exhibits interesting advantages, such as allowing the user to choose the values for the computation of the priorities, like the expected completion time (ECT) and cost. The approach is also very flexible, allowing through a set of parameters the specification of particular scheduling policies. To validate this method we built an XtrebWeb-like simulator, which is capable of running on real traces. We experiment on internet desktop grid (IDG), cloud and best effort grid (BEG), with various workloads. The results show that the Promethee-based scheduling method obtains good performance especially on IDG when certain fractions of the tasks fail. We also prove that multi-criteria scheduling using Promethee performs better than single-criterion scheduling, improving both makespan and cost. Also, a simple definition of ECT is the most efficient in terms of makespan. In this work we also explain the challenges of using Promethee for scheduling in DCIs.

在分布式计算基础设施(dci)中调度任务具有挑战性，主要是因为调度程序面临许多或多或少依赖的参数，这些参数表征来自特定计算环境和任务的主机。本文介绍了一种多准则的分布式数据中心调度方法，目的是在基于拉的调度程序上更好地匹配在优先队列中等待的主机和任务。该方法的新颖之处在于使用Promethee[1]决策辅助工具来选择任务。为了计算任务之间的偏好关系(优先级)，这种方法对表征任务的值进行两两比较。该方法显示出有趣的优点，例如允许用户选择计算优先级的值，如预期完成时间(ECT)和成本。这种方法也非常灵活，允许通过一组参数指定特定的调度策略。为了验证这个方法，我们构建了一个类似xtrebweb的模拟器，它能够在真实的轨迹上运行。我们在互联网桌面网格(IDG)、云和最佳努力网格(BEG)上进行了不同工作负载的实验。结果表明，当部分任务失败时，基于prometheus的调度方法获得了较好的调度性能。我们还证明了使用Promethee进行多准则调度的性能优于单准则调度，从而提高了完工时间和成本。此外，简单地定义电痉挛疗法在最大完工时间方面是最有效的。在这项工作中，我们还解释了在dci中使用Promethee进行调度的挑战。

{"title":"Using Promethee methods for multi-criteria pull-based scheduling on DCIs","authors":"M. Moca, G. Fedak","doi":"10.1109/eScience.2012.6404483","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404483","url":null,"abstract":"Scheduling tasks in distributed computing infrastructures (DCIs) is challenging mainly because the scheduler is facing a number of more or less dependent parameters that characterize the hosts coming from a particular computing environment and the tasks. In this paper we introduce a multi-criteria scheduling method for DCIs, aiming a better matching between hosts, and tasks waiting in a priority queue at a pull-based scheduler. The novelty of the approach consists in employing the Promethee [1] decision aid for selecting tasks. In the aim of computing preference relationships (priorities) among tasks, this approach performs pairwise comparisons of values that characterize tasks. The method exhibits interesting advantages, such as allowing the user to choose the values for the computation of the priorities, like the expected completion time (ECT) and cost. The approach is also very flexible, allowing through a set of parameters the specification of particular scheduling policies. To validate this method we built an XtrebWeb-like simulator, which is capable of running on real traces. We experiment on internet desktop grid (IDG), cloud and best effort grid (BEG), with various workloads. The results show that the Promethee-based scheduling method obtains good performance especially on IDG when certain fractions of the tasks fail. We also prove that multi-criteria scheduling using Promethee performs better than single-criterion scheduling, improving both makespan and cost. Also, a simple definition of ECT is the most efficient in terms of makespan. In this work we also explain the challenges of using Promethee for scheduling in DCIs.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"25 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84210702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Integration of modern data management practice with scientific workflows 现代数据管理实践与科学工作流程的集成

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404426

N. Killeen Jason M. Lohrey M. Farrell Wilson Liu S. Garic G. Egan D. Abramson H. Nguyen

Modern science increasingly involves managing and processing large amounts of distributed data accessed by global teams of researchers. To do this, we need systems that combine data, meta-data and workflows into a single system. This paper discusses such a system, built from a number of existing technologies. We demonstrate the effectiveness on a case study that analyses MRI data.

现代科学越来越多地涉及管理和处理由全球研究人员团队访问的大量分布式数据。要做到这一点，我们需要将数据、元数据和工作流结合到一个系统中的系统。本文讨论了这样一个系统，它由许多现有技术构建而成。我们在一个分析MRI数据的案例研究中证明了该方法的有效性。

引用次数: 5

MIM: A Minimum Information Model vocabulary and framework for Scientific Linked Data MIM:科学关联数据的最小信息模型词汇表和框架

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404489

G. Klyne C. Goble Jun Zhao Matthew Gamble

Linked Data holds great promise in the Life Sciences as a platform to enable an interoperable data commons, supporting new opportunities for discovery. Minimum Information Checklists have emerged within the Life Sciences as a means of standardising the reporting of experiments in an effort to increase the quality and reusability of the reported data. Existing tooling built around these checklists is aimed at supporting experimental scientists in the production of experiment reports that are compliant. It remains a challenge to quickly and easily assess an arbitrary set of data against these checklists. We present the MIM (Minimum Information Model) vocabulary and framework which aims to provide a practical, and scalable approach to describing and assessing Linked Data against minimum information checklists. The MIM framework aims to support three core activities: (1) publishing well described minimum information checklists in RDF as Linked Data; (2) publishing Linked Data against these checklists; and (3) validating existing “in the wild” Linked Data against a published checklist. We discuss the design considerations of the vocabulary and present its main classes. We demonstrate the utility of the framework with a checklist designed for the publishing of Chemical Structure Linked Data using data extracted from Wikipedia as an example.

关联数据作为一个平台，在生命科学领域具有巨大的前景，可以实现可互操作的数据共享，支持新的发现机会。在生命科学领域，为了提高报告数据的质量和可重用性，已经出现了最低信息清单，作为一种标准化实验报告的手段。围绕这些检查表构建的现有工具旨在支持实验科学家生产符合要求的实验报告。根据这些清单快速、轻松地评估任意一组数据仍然是一个挑战。我们提出了MIM(最小信息模型)词汇表和框架，旨在提供一种实用的、可扩展的方法来根据最小信息检查表描述和评估关联数据。MIM框架旨在支持三个核心活动:(1)在RDF中作为关联数据发布描述良好的最小信息检查表;(2)根据这些核对表发布关联数据;(3)根据发布的清单验证现有的“野外”关联数据。我们讨论了词汇表的设计注意事项，并介绍了它的主要类。我们以维基百科中提取的数据为例，通过为化学结构关联数据的发布设计的清单来演示该框架的实用性。

{"title":"MIM: A Minimum Information Model vocabulary and framework for Scientific Linked Data","authors":"Matthew Gamble, C. Goble, G. Klyne, Jun Zhao","doi":"10.1109/ESCIENCE.2012.6404489","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404489","url":null,"abstract":"Linked Data holds great promise in the Life Sciences as a platform to enable an interoperable data commons, supporting new opportunities for discovery. Minimum Information Checklists have emerged within the Life Sciences as a means of standardising the reporting of experiments in an effort to increase the quality and reusability of the reported data. Existing tooling built around these checklists is aimed at supporting experimental scientists in the production of experiment reports that are compliant. It remains a challenge to quickly and easily assess an arbitrary set of data against these checklists. We present the MIM (Minimum Information Model) vocabulary and framework which aims to provide a practical, and scalable approach to describing and assessing Linked Data against minimum information checklists. The MIM framework aims to support three core activities: (1) publishing well described minimum information checklists in RDF as Linked Data; (2) publishing Linked Data against these checklists; and (3) validating existing “in the wild” Linked Data against a published checklist. We discuss the design considerations of the vocabulary and present its main classes. We demonstrate the utility of the framework with a checklist designed for the publishing of Chemical Structure Linked Data using data extracted from Wikipedia as an example.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"26 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86020330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Towards a quantitative academic internationalization assessment of Brazilian research groups 迈向巴西研究团体学术国际化量化评估

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404456

R. M. C. Junior Evelyn Perez Cervantes J. Mena-Chalco

This paper introduces a new computational method to automatically estimate the International Publication Ratio (IPR) based on the analysis of bibliographical productions of Brazilian research groups, a task that would be too difficult (in many cases, impossible) to be performed manually. The proposed method explores the DOI number to identify the countries of every co-author who participated in each publication. Considering the bibliometric data from the Brazilian Lattes platform we show that is possible to make a good estimation of the IPR for research groups. Calculating the IPR is important in order to make a quantitative evaluation of the science progress and to establish a comparison between the academic institutions or knowledge areas. The experiments considering research groups, belonging to the 100 more collaborative researchers of five Brazilian major knowledge areas, confirm that the our proposal leads to an effective way to infer the IPR.

本文介绍了一种新的计算方法，基于对巴西研究小组的书目成果的分析，自动估计国际出版比率(IPR)，这一任务太难(在许多情况下，不可能)手动执行。提出的方法通过探索DOI号来确定参与每一篇论文的每位合著者所在的国家。考虑到来自巴西拿铁平台的文献计量数据，我们表明有可能对研究小组的知识产权进行良好的估计。知识产权的计算对于科学进步的定量评价和建立学术机构或知识领域之间的比较具有重要意义。对巴西5个主要知识领域的100多个合作研究人员组成的研究小组进行的实验证实，我们的建议是一种有效的知识产权推断方法。

引用次数: 7

Towards next generations of software for distributed infrastructures: The European Middleware Initiative 面向分布式基础设施的下一代软件:欧洲中间件计划

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404415

F. Estrella B. Kónya A. Ceccanti Patrick Fuhrmam C. Aiftimiei A. D. Meglio E. Giorgio A. Aimar M. Cecchi M. Riedel L. Field J. Nilsen J. White

The last two decades have seen an exceptional increase of the available networking, computing and storage resources. Scientific research communities have exploited these enhanced capabilities developing large scale collaborations, supported by distributed infrastructures. In order to enable usage of such infrastructures, several middleware solutions have been created. However such solutions, having been developed separately, have been resulting often in incompatible middleware and infrastructures. The European Middleware Initiative (EMI) is a collaboration, started in 2010, among the major European middleware providers (ARC, dCache, gLite, UNICORE), aiming to consolidate and evolve the existing middleware stacks, facilitating their interoperability and their deployment on large distributed infrastructures, establishing at the same time a sustainable model for the future maintenance and evolution of the middleware components. This paper presents the strategy followed for the achievements of these goals : after an analysis of the situation before EMI, it is given an overview of the development strategy, followed by the most notable technical results, grouped according to the four development areas (Compute, Data, Infrastructure, Security). The rigorous process ensuring the quality of provided software is then illustrated, followed by a description the release process, and of the relations with the user communities. The last section provides an outlook to the future, focusing on the undergoing actions looking toward the sustainability of activities.

在过去的二十年里，可用的网络、计算和存储资源有了惊人的增长。科学研究团体利用这些增强的能力，在分布式基础设施的支持下开发大规模的协作。为了启用这些基础设施，已经创建了几个中间件解决方案。然而，这些单独开发的解决方案常常导致中间件和基础设施不兼容。欧洲中间件倡议(EMI)始于2010年，是欧洲主要中间件提供商(ARC、dCache、gLite、UNICORE)之间的一项合作，旨在巩固和发展现有的中间件堆栈，促进它们的互操作性和在大型分布式基础设施上的部署，同时为中间件组件的未来维护和发展建立一个可持续的模型。本文提出了实现这些目标所遵循的战略:在对EMI之前的情况进行分析之后，给出了发展战略的概述，然后是根据四个发展领域(计算，数据，基础设施，安全)分组的最显着的技术成果。然后说明了确保所提供软件质量的严格过程，随后描述了发布过程，以及与用户社区的关系。最后一部分提供了对未来的展望，重点是正在进行的行动，以期活动的可持续性。

{"title":"Towards next generations of software for distributed infrastructures: The European Middleware Initiative","authors":"C. Aiftimiei, A. Aimar, A. Ceccanti, M. Cecchi, A. D. Meglio, F. Estrella, Patrick Fuhrmam, E. Giorgio, B. Kónya, L. Field, J. K. Nilsen, M. Riedel, J. White","doi":"10.1109/eScience.2012.6404415","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404415","url":null,"abstract":"The last two decades have seen an exceptional increase of the available networking, computing and storage resources. Scientific research communities have exploited these enhanced capabilities developing large scale collaborations, supported by distributed infrastructures. In order to enable usage of such infrastructures, several middleware solutions have been created. However such solutions, having been developed separately, have been resulting often in incompatible middleware and infrastructures. The European Middleware Initiative (EMI) is a collaboration, started in 2010, among the major European middleware providers (ARC, dCache, gLite, UNICORE), aiming to consolidate and evolve the existing middleware stacks, facilitating their interoperability and their deployment on large distributed infrastructures, establishing at the same time a sustainable model for the future maintenance and evolution of the middleware components. This paper presents the strategy followed for the achievements of these goals : after an analysis of the situation before EMI, it is given an overview of the development strategy, followed by the most notable technical results, grouped according to the four development areas (Compute, Data, Infrastructure, Security). The rigorous process ensuring the quality of provided software is then illustrated, followed by a description the release process, and of the relations with the user communities. The last section provides an outlook to the future, focusing on the undergoing actions looking toward the sustainability of activities.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"113 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75337595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Pilot abstractions for compute, data, and network 用于计算、数据和网络的试验抽象

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404459

S. Jha S. Olabarriaga M. Santcroos D. Katz

Scientific experiments in a variety of domains are producing increasing amounts of data that need to be processed efficiently. Distributed Computing Infrastructures are increasingly important in fulfilling these large-scale computational requirements.

各个领域的科学实验正在产生越来越多的数据，需要对这些数据进行有效的处理。分布式计算基础设施在满足这些大规模计算需求方面变得越来越重要。

引用次数: 3

CINET: A cyberinfrastructure for network science CINET:网络科学的网络基础设施

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404422

Networks are an effective abstraction for representing real systems. Consequently, network science is increasingly used in academia and industry to solve problems in many fields. Computations that determine structure properties and dynamical behaviors of networks are useful because they give insights into the characteristics of real systems. We introduce a newly built and deployed cyberinfrastructure for network science (CINET) that performs such computations, with the following features: (i) it offers realistic networks from the literature and various random and deterministic network generators; (ii) it provides many algorithmic modules and measures to study and characterize networks; (iii) it is designed for efficient execution of complex algorithms on distributed high performance computers so that they scale to large networks; and (iv) it is hosted with web interfaces so that those without direct access to high performance computing resources and those who are not computing experts can still reap the system benefits. It is a combination of application design and cyberinfrastructure that makes these features possible. To our knowledge, these capabilities collectively make CINET novel. We describe the system and illustrative use cases, with a focus on the CINET user.

网络是表示真实系统的有效抽象。因此，网络科学越来越多地应用于学术界和工业界，以解决许多领域的问题。决定网络结构属性和动态行为的计算是有用的，因为它们提供了对真实系统特征的见解。我们介绍了一个新建立和部署的网络科学网络基础设施(CINET)，它执行这样的计算，具有以下特点:(i)它提供了来自文献和各种随机和确定性网络生成器的现实网络;(ii)它提供了许多算法模块和措施来研究和表征网络;(iii)它的设计是为了在分布式高性能计算机上有效地执行复杂的算法，以便它们扩展到大型网络;(iv)它是由web接口托管的，这样那些没有直接访问高性能计算资源和那些不是计算专家的人仍然可以获得系统的好处。它是应用程序设计和网络基础设施的结合，使这些功能成为可能。据我们所知，这些功能共同使CINET新颖。我们描述了系统和说明性用例，重点放在CINET用户上。

{"title":"CINET: A cyberinfrastructure for network science","authors":"S. Abdelhamid, R. Aló, S. Arifuzzaman, P. Beckman, Md Hasanuzzaman Bhuiyan, K. Bisset, E. Fox, Geoffrey Fox, K. Hall, S. Hasan, A. Joshi, Maleq Khan, C. Kuhlman, Spencer J. Lee, J. Leidig, Hemanth Makkapati, M. Marathe, H. Mortveit, J. Qiu, S. Ravi, Z. Shams, O. Sirisaengtaksin, R. Subbiah, S. Swarup, N. Trebon, A. Vullikanti, Zhao Zhao","doi":"10.1109/eScience.2012.6404422","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404422","url":null,"abstract":"Networks are an effective abstraction for representing real systems. Consequently, network science is increasingly used in academia and industry to solve problems in many fields. Computations that determine structure properties and dynamical behaviors of networks are useful because they give insights into the characteristics of real systems. We introduce a newly built and deployed cyberinfrastructure for network science (CINET) that performs such computations, with the following features: (i) it offers realistic networks from the literature and various random and deterministic network generators; (ii) it provides many algorithmic modules and measures to study and characterize networks; (iii) it is designed for efficient execution of complex algorithms on distributed high performance computers so that they scale to large networks; and (iv) it is hosted with web interfaces so that those without direct access to high performance computing resources and those who are not computing experts can still reap the system benefits. It is a combination of application design and cyberinfrastructure that makes these features possible. To our knowledge, these capabilities collectively make CINET novel. We describe the system and illustrative use cases, with a focus on the CINET user.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"36 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73165151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Experiences in the design and implementation of a Social Cloud for Volunteer Computing 志愿者计算社会云的设计和实现经验

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404452

K. Bubendorfer K. Chard Ryan Chard

Volunteer computing provides an alternative computing paradigm for establishing the resources required to support large scale scientific computing. The model is particularly well suited for projects that have high popularity and little available computing infrastructure. The premise of volunteer computing platforms is the contribution of computing resources by individuals for little to no gain. It is therefore difficult to attract and retain contributors to projects. The Social Cloud for Volunteer Computing aims to exploit social engineering principles and the ubiquity of social networks to increase the outreach of volunteer computing, by providing an integrated volunteer computing application and creating gamification algorithms based on social principles to encourage contribution. In this paper we present the development of a production SoCVC, detailing the architecture, implementation and performance of the SoCVC Facebook application and show that the approach proposed could have a high impact on volunteer computing projects.

志愿计算为建立支持大规模科学计算所需的资源提供了另一种计算范例。该模型特别适合于受欢迎程度高、可用计算基础设施少的项目。志愿计算平台的前提是个人贡献计算资源，几乎没有任何收益。因此，很难吸引和留住项目的贡献者。志愿者计算的社会云旨在利用社会工程原理和无处不在的社会网络，通过提供一个集成的志愿者计算应用程序和创建基于社会原则的游戏化算法来鼓励贡献，从而增加志愿者计算的外联性。在本文中，我们提出了一个生产SoCVC的开发，详细介绍了SoCVC Facebook应用程序的体系结构、实现和性能，并表明所提出的方法可能对志愿者计算项目产生很大影响。

引用次数: 6

Cooperative VM migration for a virtualized HPC cluster with VMM-bypass I/O devices 虚拟机协同迁移(VMM-bypass I/O)

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404487

Takahiro Hirofuchi H. Nakada Ryousei Takano T. Kudoh Yoshio Tanaka

An HPC cloud, a flexible and robust cloud computing service specially dedicated to high performance computing, is a promising future e-Science platform. In cloud computing, virtualization is widely used to achieve flexibility and security. Virtualization makes migration or checkpoint/restart of computing elements (virtual machines) easy, and such features are useful for realizing fault tolerance and server consolidations. However, in widely used virtualization schemes, I/O devices are also virtualized, and thus I/O performance is severely degraded. To cope with this problem, VMM-bypass I/O technologies, including PCI passthrough and SR-IOV, in which the I/O overhead can be significantly reduced, have been introduced. However, such VMM-bypass I/O technologies make it impossible to migrate or checkpoint/restart virtual machines, since virtual machines are directly attached to hardware devices. This paper proposes a novel and practical mechanism, called Symbiotic Virtualization (SymVirt), for enabling migration and checkpoint/restart on a virtualized cluster with VMM-bypass I/O devices, without the virtualization overhead during normal operations. SymVirt allows a VMM to cooperate with a message passing layer on the guest OS, then it realizes VM-level migration and checkpoint/restart by using a combination of a PCI hotplug and coordination of distributed VMMs. We have implemented the proposed mechanism on top of QEMU/KVM and the Open MPI system. All PCI devices, including Infiniband and Myrinet, are supported without implementing specific para-virtualized drivers; and it is not necessary to modify either of the MPI runtime and applications. Using the proposed mechanism, we demonstrate reactive and proactive FT mechanisms on a virtualized Infiniband cluster. We have confirmed the effectiveness using both a memory intensive micro benchmark and the NAS parallel benchmark. Moreover, we also show that postcopy live migration enables us to reduce the down time of an application as the memory footprint increases.

高性能计算云是一种灵活而强大的云计算服务，专门用于高性能计算，是一个有前途的未来电子科学平台。在云计算中，虚拟化被广泛用于实现灵活性和安全性。虚拟化使迁移或检查点/重新启动计算元素(虚拟机)变得容易，并且这些特性对于实现容错和服务器整合非常有用。然而，在广泛使用的虚拟化方案中，I/O设备也被虚拟化，因此I/O性能严重下降。为了解决这个问题，引入了vmm旁路I/O技术，包括PCI直通和SR-IOV，它们可以显著降低I/O开销。然而，这种绕过vmm的I/O技术使得迁移或检查点/重新启动虚拟机变得不可能，因为虚拟机直接连接到硬件设备。本文提出了一种新颖实用的机制，称为共生虚拟化(SymVirt)，用于在使用VMM-bypass I/O设备的虚拟化集群上实现迁移和检查点/重启，而在正常操作期间无需虚拟化开销。SymVirt允许VMM与来宾操作系统上的消息传递层合作，然后通过PCI热插拔和分布式VMM协调的组合实现vm级迁移和检查点/重启。我们已经在QEMU/KVM和Open MPI系统上实现了所提出的机制。支持所有PCI设备，包括Infiniband和Myrinet，而无需实现特定的准虚拟化驱动程序;并且不需要修改MPI运行时和应用程序。使用提出的机制，我们在虚拟化Infiniband集群上演示了被动和主动FT机制。我们已经使用内存密集型微基准测试和NAS并行基准测试证实了其有效性。此外，我们还展示了复制后的实时迁移使我们能够在内存占用增加时减少应用程序的停机时间。

{"title":"Cooperative VM migration for a virtualized HPC cluster with VMM-bypass I/O devices","authors":"Ryousei Takano, H. Nakada, Takahiro Hirofuchi, Yoshio Tanaka, T. Kudoh","doi":"10.1109/eScience.2012.6404487","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404487","url":null,"abstract":"An HPC cloud, a flexible and robust cloud computing service specially dedicated to high performance computing, is a promising future e-Science platform. In cloud computing, virtualization is widely used to achieve flexibility and security. Virtualization makes migration or checkpoint/restart of computing elements (virtual machines) easy, and such features are useful for realizing fault tolerance and server consolidations. However, in widely used virtualization schemes, I/O devices are also virtualized, and thus I/O performance is severely degraded. To cope with this problem, VMM-bypass I/O technologies, including PCI passthrough and SR-IOV, in which the I/O overhead can be significantly reduced, have been introduced. However, such VMM-bypass I/O technologies make it impossible to migrate or checkpoint/restart virtual machines, since virtual machines are directly attached to hardware devices. This paper proposes a novel and practical mechanism, called Symbiotic Virtualization (SymVirt), for enabling migration and checkpoint/restart on a virtualized cluster with VMM-bypass I/O devices, without the virtualization overhead during normal operations. SymVirt allows a VMM to cooperate with a message passing layer on the guest OS, then it realizes VM-level migration and checkpoint/restart by using a combination of a PCI hotplug and coordination of distributed VMMs. We have implemented the proposed mechanism on top of QEMU/KVM and the Open MPI system. All PCI devices, including Infiniband and Myrinet, are supported without implementing specific para-virtualized drivers; and it is not necessary to modify either of the MPI runtime and applications. Using the proposed mechanism, we demonstrate reactive and proactive FT mechanisms on a virtualized Infiniband cluster. We have confirmed the effectiveness using both a memory intensive micro benchmark and the NAS parallel benchmark. Moreover, we also show that postcopy live migration enables us to reduce the down time of an application as the memory footprint increases.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"4 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85319750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

WorkflowSim: A toolkit for simulating scientific workflows in distributed environments WorkflowSim:用于模拟分布式环境中的科学工作流的工具包

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404430

E. Deelman Weiwei Chen

Simulation is one of the most popular evaluation methods in scientific workflow studies. However, existing workflow simulators fail to provide a framework that takes into consideration heterogeneous system overheads and failures. They also lack the support for widely used workflow optimization techniques such as task clustering. In this paper, we introduce WorkflowSim, which extends the existing CloudSim simulator by providing a higher layer of workflow management. We also indicate that to ignore system overheads and failures in simulating scientific workflows could cause significant inaccuracies in the predicted workflow runtime. To further validate its value in promoting other research work, we introduce two promising research areas for which WorkflowSim provides a unique and effective evaluation platform.

仿真是科学工作流研究中最常用的评价方法之一。然而，现有的工作流模拟器无法提供一个考虑到异构系统开销和故障的框架。它们还缺乏对广泛使用的工作流优化技术(如任务集群)的支持。在本文中，我们介绍了WorkflowSim，它通过提供更高层次的工作流管理来扩展现有的CloudSim模拟器。我们还指出，在模拟科学工作流时忽略系统开销和失败可能会导致预测工作流运行时中的重大不准确性。为了进一步验证其在促进其他研究工作中的价值，我们介绍了两个有前景的研究领域，WorkflowSim为其提供了独特而有效的评估平台。

引用次数: 434

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE 8th International Conference on E-Science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀