Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404483
M. Moca, G. Fedak
Scheduling tasks in distributed computing infrastructures (DCIs) is challenging mainly because the scheduler is facing a number of more or less dependent parameters that characterize the hosts coming from a particular computing environment and the tasks. In this paper we introduce a multi-criteria scheduling method for DCIs, aiming a better matching between hosts, and tasks waiting in a priority queue at a pull-based scheduler. The novelty of the approach consists in employing the Promethee [1] decision aid for selecting tasks. In the aim of computing preference relationships (priorities) among tasks, this approach performs pairwise comparisons of values that characterize tasks. The method exhibits interesting advantages, such as allowing the user to choose the values for the computation of the priorities, like the expected completion time (ECT) and cost. The approach is also very flexible, allowing through a set of parameters the specification of particular scheduling policies. To validate this method we built an XtrebWeb-like simulator, which is capable of running on real traces. We experiment on internet desktop grid (IDG), cloud and best effort grid (BEG), with various workloads. The results show that the Promethee-based scheduling method obtains good performance especially on IDG when certain fractions of the tasks fail. We also prove that multi-criteria scheduling using Promethee performs better than single-criterion scheduling, improving both makespan and cost. Also, a simple definition of ECT is the most efficient in terms of makespan. In this work we also explain the challenges of using Promethee for scheduling in DCIs.
{"title":"Using Promethee methods for multi-criteria pull-based scheduling on DCIs","authors":"M. Moca, G. Fedak","doi":"10.1109/eScience.2012.6404483","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404483","url":null,"abstract":"Scheduling tasks in distributed computing infrastructures (DCIs) is challenging mainly because the scheduler is facing a number of more or less dependent parameters that characterize the hosts coming from a particular computing environment and the tasks. In this paper we introduce a multi-criteria scheduling method for DCIs, aiming a better matching between hosts, and tasks waiting in a priority queue at a pull-based scheduler. The novelty of the approach consists in employing the Promethee [1] decision aid for selecting tasks. In the aim of computing preference relationships (priorities) among tasks, this approach performs pairwise comparisons of values that characterize tasks. The method exhibits interesting advantages, such as allowing the user to choose the values for the computation of the priorities, like the expected completion time (ECT) and cost. The approach is also very flexible, allowing through a set of parameters the specification of particular scheduling policies. To validate this method we built an XtrebWeb-like simulator, which is capable of running on real traces. We experiment on internet desktop grid (IDG), cloud and best effort grid (BEG), with various workloads. The results show that the Promethee-based scheduling method obtains good performance especially on IDG when certain fractions of the tasks fail. We also prove that multi-criteria scheduling using Promethee performs better than single-criterion scheduling, improving both makespan and cost. Also, a simple definition of ECT is the most efficient in terms of makespan. In this work we also explain the challenges of using Promethee for scheduling in DCIs.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"25 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84210702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404426
N. Killeen, Jason M. Lohrey, M. Farrell, Wilson Liu, S. Garic, D. Abramson, H. Nguyen, G. Egan
Modern science increasingly involves managing and processing large amounts of distributed data accessed by global teams of researchers. To do this, we need systems that combine data, meta-data and workflows into a single system. This paper discusses such a system, built from a number of existing technologies. We demonstrate the effectiveness on a case study that analyses MRI data.
{"title":"Integration of modern data management practice with scientific workflows","authors":"N. Killeen, Jason M. Lohrey, M. Farrell, Wilson Liu, S. Garic, D. Abramson, H. Nguyen, G. Egan","doi":"10.1109/eScience.2012.6404426","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404426","url":null,"abstract":"Modern science increasingly involves managing and processing large amounts of distributed data accessed by global teams of researchers. To do this, we need systems that combine data, meta-data and workflows into a single system. This paper discusses such a system, built from a number of existing technologies. We demonstrate the effectiveness on a case study that analyses MRI data.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"4 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87411427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404489
Matthew Gamble, C. Goble, G. Klyne, Jun Zhao
Linked Data holds great promise in the Life Sciences as a platform to enable an interoperable data commons, supporting new opportunities for discovery. Minimum Information Checklists have emerged within the Life Sciences as a means of standardising the reporting of experiments in an effort to increase the quality and reusability of the reported data. Existing tooling built around these checklists is aimed at supporting experimental scientists in the production of experiment reports that are compliant. It remains a challenge to quickly and easily assess an arbitrary set of data against these checklists. We present the MIM (Minimum Information Model) vocabulary and framework which aims to provide a practical, and scalable approach to describing and assessing Linked Data against minimum information checklists. The MIM framework aims to support three core activities: (1) publishing well described minimum information checklists in RDF as Linked Data; (2) publishing Linked Data against these checklists; and (3) validating existing “in the wild” Linked Data against a published checklist. We discuss the design considerations of the vocabulary and present its main classes. We demonstrate the utility of the framework with a checklist designed for the publishing of Chemical Structure Linked Data using data extracted from Wikipedia as an example.
{"title":"MIM: A Minimum Information Model vocabulary and framework for Scientific Linked Data","authors":"Matthew Gamble, C. Goble, G. Klyne, Jun Zhao","doi":"10.1109/ESCIENCE.2012.6404489","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404489","url":null,"abstract":"Linked Data holds great promise in the Life Sciences as a platform to enable an interoperable data commons, supporting new opportunities for discovery. Minimum Information Checklists have emerged within the Life Sciences as a means of standardising the reporting of experiments in an effort to increase the quality and reusability of the reported data. Existing tooling built around these checklists is aimed at supporting experimental scientists in the production of experiment reports that are compliant. It remains a challenge to quickly and easily assess an arbitrary set of data against these checklists. We present the MIM (Minimum Information Model) vocabulary and framework which aims to provide a practical, and scalable approach to describing and assessing Linked Data against minimum information checklists. The MIM framework aims to support three core activities: (1) publishing well described minimum information checklists in RDF as Linked Data; (2) publishing Linked Data against these checklists; and (3) validating existing “in the wild” Linked Data against a published checklist. We discuss the design considerations of the vocabulary and present its main classes. We demonstrate the utility of the framework with a checklist designed for the publishing of Chemical Structure Linked Data using data extracted from Wikipedia as an example.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"26 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86020330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404456
Evelyn Perez Cervantes, J. Mena-Chalco, R. M. C. Junior
This paper introduces a new computational method to automatically estimate the International Publication Ratio (IPR) based on the analysis of bibliographical productions of Brazilian research groups, a task that would be too difficult (in many cases, impossible) to be performed manually. The proposed method explores the DOI number to identify the countries of every co-author who participated in each publication. Considering the bibliometric data from the Brazilian Lattes platform we show that is possible to make a good estimation of the IPR for research groups. Calculating the IPR is important in order to make a quantitative evaluation of the science progress and to establish a comparison between the academic institutions or knowledge areas. The experiments considering research groups, belonging to the 100 more collaborative researchers of five Brazilian major knowledge areas, confirm that the our proposal leads to an effective way to infer the IPR.
{"title":"Towards a quantitative academic internationalization assessment of Brazilian research groups","authors":"Evelyn Perez Cervantes, J. Mena-Chalco, R. M. C. Junior","doi":"10.1109/eScience.2012.6404456","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404456","url":null,"abstract":"This paper introduces a new computational method to automatically estimate the International Publication Ratio (IPR) based on the analysis of bibliographical productions of Brazilian research groups, a task that would be too difficult (in many cases, impossible) to be performed manually. The proposed method explores the DOI number to identify the countries of every co-author who participated in each publication. Considering the bibliometric data from the Brazilian Lattes platform we show that is possible to make a good estimation of the IPR for research groups. Calculating the IPR is important in order to make a quantitative evaluation of the science progress and to establish a comparison between the academic institutions or knowledge areas. The experiments considering research groups, belonging to the 100 more collaborative researchers of five Brazilian major knowledge areas, confirm that the our proposal leads to an effective way to infer the IPR.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"1 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90126902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404415
C. Aiftimiei, A. Aimar, A. Ceccanti, M. Cecchi, A. D. Meglio, F. Estrella, Patrick Fuhrmam, E. Giorgio, B. Kónya, L. Field, J. K. Nilsen, M. Riedel, J. White
The last two decades have seen an exceptional increase of the available networking, computing and storage resources. Scientific research communities have exploited these enhanced capabilities developing large scale collaborations, supported by distributed infrastructures. In order to enable usage of such infrastructures, several middleware solutions have been created. However such solutions, having been developed separately, have been resulting often in incompatible middleware and infrastructures. The European Middleware Initiative (EMI) is a collaboration, started in 2010, among the major European middleware providers (ARC, dCache, gLite, UNICORE), aiming to consolidate and evolve the existing middleware stacks, facilitating their interoperability and their deployment on large distributed infrastructures, establishing at the same time a sustainable model for the future maintenance and evolution of the middleware components. This paper presents the strategy followed for the achievements of these goals : after an analysis of the situation before EMI, it is given an overview of the development strategy, followed by the most notable technical results, grouped according to the four development areas (Compute, Data, Infrastructure, Security). The rigorous process ensuring the quality of provided software is then illustrated, followed by a description the release process, and of the relations with the user communities. The last section provides an outlook to the future, focusing on the undergoing actions looking toward the sustainability of activities.
{"title":"Towards next generations of software for distributed infrastructures: The European Middleware Initiative","authors":"C. Aiftimiei, A. Aimar, A. Ceccanti, M. Cecchi, A. D. Meglio, F. Estrella, Patrick Fuhrmam, E. Giorgio, B. Kónya, L. Field, J. K. Nilsen, M. Riedel, J. White","doi":"10.1109/eScience.2012.6404415","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404415","url":null,"abstract":"The last two decades have seen an exceptional increase of the available networking, computing and storage resources. Scientific research communities have exploited these enhanced capabilities developing large scale collaborations, supported by distributed infrastructures. In order to enable usage of such infrastructures, several middleware solutions have been created. However such solutions, having been developed separately, have been resulting often in incompatible middleware and infrastructures. The European Middleware Initiative (EMI) is a collaboration, started in 2010, among the major European middleware providers (ARC, dCache, gLite, UNICORE), aiming to consolidate and evolve the existing middleware stacks, facilitating their interoperability and their deployment on large distributed infrastructures, establishing at the same time a sustainable model for the future maintenance and evolution of the middleware components. This paper presents the strategy followed for the achievements of these goals : after an analysis of the situation before EMI, it is given an overview of the development strategy, followed by the most notable technical results, grouped according to the four development areas (Compute, Data, Infrastructure, Security). The rigorous process ensuring the quality of provided software is then illustrated, followed by a description the release process, and of the relations with the user communities. The last section provides an outlook to the future, focusing on the undergoing actions looking toward the sustainability of activities.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"113 1","pages":"1-10"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75337595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404459
M. Santcroos, S. Olabarriaga, D. Katz, S. Jha
Scientific experiments in a variety of domains are producing increasing amounts of data that need to be processed efficiently. Distributed Computing Infrastructures are increasingly important in fulfilling these large-scale computational requirements.
{"title":"Pilot abstractions for compute, data, and network","authors":"M. Santcroos, S. Olabarriaga, D. Katz, S. Jha","doi":"10.1109/ESCIENCE.2012.6404459","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404459","url":null,"abstract":"Scientific experiments in a variety of domains are producing increasing amounts of data that need to be processed efficiently. Distributed Computing Infrastructures are increasingly important in fulfilling these large-scale computational requirements.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"102 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85780617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404422
S. Abdelhamid, R. Aló, S. Arifuzzaman, P. Beckman, Md Hasanuzzaman Bhuiyan, K. Bisset, E. Fox, Geoffrey Fox, K. Hall, S. Hasan, A. Joshi, Maleq Khan, C. Kuhlman, Spencer J. Lee, J. Leidig, Hemanth Makkapati, M. Marathe, H. Mortveit, J. Qiu, S. Ravi, Z. Shams, O. Sirisaengtaksin, R. Subbiah, S. Swarup, N. Trebon, A. Vullikanti, Zhao Zhao
Networks are an effective abstraction for representing real systems. Consequently, network science is increasingly used in academia and industry to solve problems in many fields. Computations that determine structure properties and dynamical behaviors of networks are useful because they give insights into the characteristics of real systems. We introduce a newly built and deployed cyberinfrastructure for network science (CINET) that performs such computations, with the following features: (i) it offers realistic networks from the literature and various random and deterministic network generators; (ii) it provides many algorithmic modules and measures to study and characterize networks; (iii) it is designed for efficient execution of complex algorithms on distributed high performance computers so that they scale to large networks; and (iv) it is hosted with web interfaces so that those without direct access to high performance computing resources and those who are not computing experts can still reap the system benefits. It is a combination of application design and cyberinfrastructure that makes these features possible. To our knowledge, these capabilities collectively make CINET novel. We describe the system and illustrative use cases, with a focus on the CINET user.
{"title":"CINET: A cyberinfrastructure for network science","authors":"S. Abdelhamid, R. Aló, S. Arifuzzaman, P. Beckman, Md Hasanuzzaman Bhuiyan, K. Bisset, E. Fox, Geoffrey Fox, K. Hall, S. Hasan, A. Joshi, Maleq Khan, C. Kuhlman, Spencer J. Lee, J. Leidig, Hemanth Makkapati, M. Marathe, H. Mortveit, J. Qiu, S. Ravi, Z. Shams, O. Sirisaengtaksin, R. Subbiah, S. Swarup, N. Trebon, A. Vullikanti, Zhao Zhao","doi":"10.1109/eScience.2012.6404422","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404422","url":null,"abstract":"Networks are an effective abstraction for representing real systems. Consequently, network science is increasingly used in academia and industry to solve problems in many fields. Computations that determine structure properties and dynamical behaviors of networks are useful because they give insights into the characteristics of real systems. We introduce a newly built and deployed cyberinfrastructure for network science (CINET) that performs such computations, with the following features: (i) it offers realistic networks from the literature and various random and deterministic network generators; (ii) it provides many algorithmic modules and measures to study and characterize networks; (iii) it is designed for efficient execution of complex algorithms on distributed high performance computers so that they scale to large networks; and (iv) it is hosted with web interfaces so that those without direct access to high performance computing resources and those who are not computing experts can still reap the system benefits. It is a combination of application design and cyberinfrastructure that makes these features possible. To our knowledge, these capabilities collectively make CINET novel. We describe the system and illustrative use cases, with a focus on the CINET user.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"36 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73165151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404452
Ryan Chard, K. Bubendorfer, K. Chard
Volunteer computing provides an alternative computing paradigm for establishing the resources required to support large scale scientific computing. The model is particularly well suited for projects that have high popularity and little available computing infrastructure. The premise of volunteer computing platforms is the contribution of computing resources by individuals for little to no gain. It is therefore difficult to attract and retain contributors to projects. The Social Cloud for Volunteer Computing aims to exploit social engineering principles and the ubiquity of social networks to increase the outreach of volunteer computing, by providing an integrated volunteer computing application and creating gamification algorithms based on social principles to encourage contribution. In this paper we present the development of a production SoCVC, detailing the architecture, implementation and performance of the SoCVC Facebook application and show that the approach proposed could have a high impact on volunteer computing projects.
{"title":"Experiences in the design and implementation of a Social Cloud for Volunteer Computing","authors":"Ryan Chard, K. Bubendorfer, K. Chard","doi":"10.1109/eScience.2012.6404452","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404452","url":null,"abstract":"Volunteer computing provides an alternative computing paradigm for establishing the resources required to support large scale scientific computing. The model is particularly well suited for projects that have high popularity and little available computing infrastructure. The premise of volunteer computing platforms is the contribution of computing resources by individuals for little to no gain. It is therefore difficult to attract and retain contributors to projects. The Social Cloud for Volunteer Computing aims to exploit social engineering principles and the ubiquity of social networks to increase the outreach of volunteer computing, by providing an integrated volunteer computing application and creating gamification algorithms based on social principles to encourage contribution. In this paper we present the development of a production SoCVC, detailing the architecture, implementation and performance of the SoCVC Facebook application and show that the approach proposed could have a high impact on volunteer computing projects.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"10 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80157960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404487
Ryousei Takano, H. Nakada, Takahiro Hirofuchi, Yoshio Tanaka, T. Kudoh
An HPC cloud, a flexible and robust cloud computing service specially dedicated to high performance computing, is a promising future e-Science platform. In cloud computing, virtualization is widely used to achieve flexibility and security. Virtualization makes migration or checkpoint/restart of computing elements (virtual machines) easy, and such features are useful for realizing fault tolerance and server consolidations. However, in widely used virtualization schemes, I/O devices are also virtualized, and thus I/O performance is severely degraded. To cope with this problem, VMM-bypass I/O technologies, including PCI passthrough and SR-IOV, in which the I/O overhead can be significantly reduced, have been introduced. However, such VMM-bypass I/O technologies make it impossible to migrate or checkpoint/restart virtual machines, since virtual machines are directly attached to hardware devices. This paper proposes a novel and practical mechanism, called Symbiotic Virtualization (SymVirt), for enabling migration and checkpoint/restart on a virtualized cluster with VMM-bypass I/O devices, without the virtualization overhead during normal operations. SymVirt allows a VMM to cooperate with a message passing layer on the guest OS, then it realizes VM-level migration and checkpoint/restart by using a combination of a PCI hotplug and coordination of distributed VMMs. We have implemented the proposed mechanism on top of QEMU/KVM and the Open MPI system. All PCI devices, including Infiniband and Myrinet, are supported without implementing specific para-virtualized drivers; and it is not necessary to modify either of the MPI runtime and applications. Using the proposed mechanism, we demonstrate reactive and proactive FT mechanisms on a virtualized Infiniband cluster. We have confirmed the effectiveness using both a memory intensive micro benchmark and the NAS parallel benchmark. Moreover, we also show that postcopy live migration enables us to reduce the down time of an application as the memory footprint increases.
{"title":"Cooperative VM migration for a virtualized HPC cluster with VMM-bypass I/O devices","authors":"Ryousei Takano, H. Nakada, Takahiro Hirofuchi, Yoshio Tanaka, T. Kudoh","doi":"10.1109/eScience.2012.6404487","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404487","url":null,"abstract":"An HPC cloud, a flexible and robust cloud computing service specially dedicated to high performance computing, is a promising future e-Science platform. In cloud computing, virtualization is widely used to achieve flexibility and security. Virtualization makes migration or checkpoint/restart of computing elements (virtual machines) easy, and such features are useful for realizing fault tolerance and server consolidations. However, in widely used virtualization schemes, I/O devices are also virtualized, and thus I/O performance is severely degraded. To cope with this problem, VMM-bypass I/O technologies, including PCI passthrough and SR-IOV, in which the I/O overhead can be significantly reduced, have been introduced. However, such VMM-bypass I/O technologies make it impossible to migrate or checkpoint/restart virtual machines, since virtual machines are directly attached to hardware devices. This paper proposes a novel and practical mechanism, called Symbiotic Virtualization (SymVirt), for enabling migration and checkpoint/restart on a virtualized cluster with VMM-bypass I/O devices, without the virtualization overhead during normal operations. SymVirt allows a VMM to cooperate with a message passing layer on the guest OS, then it realizes VM-level migration and checkpoint/restart by using a combination of a PCI hotplug and coordination of distributed VMMs. We have implemented the proposed mechanism on top of QEMU/KVM and the Open MPI system. All PCI devices, including Infiniband and Myrinet, are supported without implementing specific para-virtualized drivers; and it is not necessary to modify either of the MPI runtime and applications. Using the proposed mechanism, we demonstrate reactive and proactive FT mechanisms on a virtualized Infiniband cluster. We have confirmed the effectiveness using both a memory intensive micro benchmark and the NAS parallel benchmark. Moreover, we also show that postcopy live migration enables us to reduce the down time of an application as the memory footprint increases.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"4 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85319750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404430
Weiwei Chen, E. Deelman
Simulation is one of the most popular evaluation methods in scientific workflow studies. However, existing workflow simulators fail to provide a framework that takes into consideration heterogeneous system overheads and failures. They also lack the support for widely used workflow optimization techniques such as task clustering. In this paper, we introduce WorkflowSim, which extends the existing CloudSim simulator by providing a higher layer of workflow management. We also indicate that to ignore system overheads and failures in simulating scientific workflows could cause significant inaccuracies in the predicted workflow runtime. To further validate its value in promoting other research work, we introduce two promising research areas for which WorkflowSim provides a unique and effective evaluation platform.
{"title":"WorkflowSim: A toolkit for simulating scientific workflows in distributed environments","authors":"Weiwei Chen, E. Deelman","doi":"10.1109/eScience.2012.6404430","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404430","url":null,"abstract":"Simulation is one of the most popular evaluation methods in scientific workflow studies. However, existing workflow simulators fail to provide a framework that takes into consideration heterogeneous system overheads and failures. They also lack the support for widely used workflow optimization techniques such as task clustering. In this paper, we introduce WorkflowSim, which extends the existing CloudSim simulator by providing a higher layer of workflow management. We also indicate that to ignore system overheads and failures in simulating scientific workflows could cause significant inaccuracies in the predicted workflow runtime. To further validate its value in promoting other research work, we introduce two promising research areas for which WorkflowSim provides a unique and effective evaluation platform.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"19 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76949993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}