Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404443
Dinanath Sulakhe, R. Kettimuthu, Utpal J. Davé
In the past few years in the biomedical field, availability of low-cost sequencing methods in the form of next-generation sequencing has revolutionized the approaches life science researchers are undertaking in order to gain a better understanding of the causative factors of diseases. With biomedical researchers getting many of their patients' DNA and RNA sequenced, sequencing centers are working with hundreds of researchers with terabytes to petabytes of data for each researcher. The unprecedented scale at which genomic sequence data is generated today by high-throughput technologies requires sophisticated and high-performance methods of data handling and management. For the most part, however, the state of the art is to use hard disks to ship the data. As data volumes reach tens or even hundreds of terabytes, such approaches become increasingly impractical. Data stored on portable media can be easily lost, and typically is not readily accessible to all members of the collaboration. In this paper, we discuss the application of Globus Online within a sequencing facility to address the data movement and management challenges that arise as a result of exponentially increasing amount of data being generated by a rapidly growing number of research groups. We also present the unique challenges in applying a Globus Online solution in sequencing center environments and how we overcome those challenges.
{"title":"High-performance data management for genome sequencing centers using Globus Online: A case study","authors":"Dinanath Sulakhe, R. Kettimuthu, Utpal J. Davé","doi":"10.1109/eScience.2012.6404443","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404443","url":null,"abstract":"In the past few years in the biomedical field, availability of low-cost sequencing methods in the form of next-generation sequencing has revolutionized the approaches life science researchers are undertaking in order to gain a better understanding of the causative factors of diseases. With biomedical researchers getting many of their patients' DNA and RNA sequenced, sequencing centers are working with hundreds of researchers with terabytes to petabytes of data for each researcher. The unprecedented scale at which genomic sequence data is generated today by high-throughput technologies requires sophisticated and high-performance methods of data handling and management. For the most part, however, the state of the art is to use hard disks to ship the data. As data volumes reach tens or even hundreds of terabytes, such approaches become increasingly impractical. Data stored on portable media can be easily lost, and typically is not readily accessible to all members of the collaboration. In this paper, we discuss the application of Globus Online within a sequencing facility to address the data movement and management challenges that arise as a result of exponentially increasing amount of data being generated by a rapidly growing number of research groups. We also present the unique challenges in applying a Globus Online solution in sequencing center environments and how we overcome those challenges.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"127 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90263422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404467
M. R. Huq, P. Apers, A. Wombacher, Y. Wada, L. V. Beek
Scientists require provenance information either to validate their model or to investigate the origin of an unexpected value. However, they do not maintain any provenance information and even designing the processing workflow is rare in practice. Therefore, in this paper, we propose a solution that can build the workflow provenance graph by interpreting the scripts used for actual processing. Further, scientists can request fine-grained provenance information facilitating the inferred workflow provenance. We also provide a guideline to customize the workflow provenance graph based on user preferences. Our evaluation shows that the proposed approach is relevant and suitable for scientists to manage provenance.
{"title":"From scripts towards provenance inference","authors":"M. R. Huq, P. Apers, A. Wombacher, Y. Wada, L. V. Beek","doi":"10.1109/ESCIENCE.2012.6404467","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404467","url":null,"abstract":"Scientists require provenance information either to validate their model or to investigate the origin of an unexpected value. However, they do not maintain any provenance information and even designing the processing workflow is rare in practice. Therefore, in this paper, we propose a solution that can build the workflow provenance graph by interpreting the scripts used for actual processing. Further, scientists can request fine-grained provenance information facilitating the inferred workflow provenance. We also provide a guideline to customize the workflow provenance graph based on user preferences. Our evaluation shows that the proposed approach is relevant and suitable for scientists to manage provenance.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"13 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83650847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404438
J. Almeida, J. A. D. Santos, Bruna Alberton, R. Torres, L. Morellato
Plant phenology has gained importance in the context of global change research, stimulating the development of new technologies for phenological observation. Digital cameras have been successfully used as multi-channel imaging sensors, providing measures of leaf color change information (RGB channels), or leafing phenological changes in plants. We monitored leaf-changing patterns of a cerrado-savanna vegetation by taken daily digital images. We extract RGB channels from digital images and correlated with phenological changes. Our first goals were: (1) to test if the color change information is able to characterize the phenological pattern of a group of species; and (2) to test if individuals from the same functional group may be automatically identified using digital images. In this paper, we present a machine learning approach to detect phenological patterns in the digital images. Our preliminary results indicate that: (1) extreme hours (morning and afternoon) are the best for identifying plant species; and (2) different plant species present a different behavior with respect to the color change information. Based on those results, we suggest that individuals from the same functional group might be identified using digital images, and introduce a new tool to help phenology experts in the species identification and location on-the-ground.
{"title":"Remote phenology: Applying machine learning to detect phenological patterns in a cerrado savanna","authors":"J. Almeida, J. A. D. Santos, Bruna Alberton, R. Torres, L. Morellato","doi":"10.1109/ESCIENCE.2012.6404438","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404438","url":null,"abstract":"Plant phenology has gained importance in the context of global change research, stimulating the development of new technologies for phenological observation. Digital cameras have been successfully used as multi-channel imaging sensors, providing measures of leaf color change information (RGB channels), or leafing phenological changes in plants. We monitored leaf-changing patterns of a cerrado-savanna vegetation by taken daily digital images. We extract RGB channels from digital images and correlated with phenological changes. Our first goals were: (1) to test if the color change information is able to characterize the phenological pattern of a group of species; and (2) to test if individuals from the same functional group may be automatically identified using digital images. In this paper, we present a machine learning approach to detect phenological patterns in the digital images. Our preliminary results indicate that: (1) extreme hours (morning and afternoon) are the best for identifying plant species; and (2) different plant species present a different behavior with respect to the color change information. Based on those results, we suggest that individuals from the same functional group might be identified using digital images, and introduce a new tool to help phenology experts in the species identification and location on-the-ground.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"226 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83612923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404484
H. Perl, Yassene Mohammed, Michael Brenner, Matthew Smith
Data protection is a challenge when outsourcing medical analysis, especially if one is dealing with patient related data. While securing transfer channels is possible using encryption mechanisms, protecting the data during analyses is difficult as it usually involves processing steps on the plain data. A common use case in bioinformatics is when a scientist searches for a biological sequence of amino acids or DNA nucleotides in a library or database of sequences to identify similarities. Most such search algorithms are optimized for speed with less or no consideration for data protection. Fast algorithms are especially necessary because of the immense search space represented for instance by the genome or proteome of complex organisms. We propose a new secure exact term search algorithm based on Bloom filters. Our algorithm retains data privacy by using Obfuscated Bloom filters while maintaining the performance needed for real-life applications. The results can then be further aggregated using Homomorphic Cryptography to allow exact-match searching. The proposed system facilitates outsourcing exact term search of sensitive data to on-demand resources in a way which conforms to best practice of data protection.
{"title":"Fast confidential search for bio-medical data using Bloom filters and Homomorphic Cryptography","authors":"H. Perl, Yassene Mohammed, Michael Brenner, Matthew Smith","doi":"10.1109/eScience.2012.6404484","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404484","url":null,"abstract":"Data protection is a challenge when outsourcing medical analysis, especially if one is dealing with patient related data. While securing transfer channels is possible using encryption mechanisms, protecting the data during analyses is difficult as it usually involves processing steps on the plain data. A common use case in bioinformatics is when a scientist searches for a biological sequence of amino acids or DNA nucleotides in a library or database of sequences to identify similarities. Most such search algorithms are optimized for speed with less or no consideration for data protection. Fast algorithms are especially necessary because of the immense search space represented for instance by the genome or proteome of complex organisms. We propose a new secure exact term search algorithm based on Bloom filters. Our algorithm retains data privacy by using Obfuscated Bloom filters while maintaining the performance needed for real-life applications. The results can then be further aggregated using Homomorphic Cryptography to allow exact-match searching. The proposed system facilitates outsourcing exact term search of sensitive data to on-demand resources in a way which conforms to best practice of data protection.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"20 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73072583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404444
Jack Paparian, Shawn T. Brown, D. Burke, J. Grefenstette
Large-scale simulations are increasingly used to evaluate potential public health interventions in epidemics such as the H1N1 pandemic of 2009. Due to variations in both disease scenarios and in interventions, it is typical to run thousands of simulations as part of a given study. This paper addresses the challenge of visualizing the results from a large number of simulation runs. We describe a new tool called FRED Navigator that allows a user to interactively visualize results from the FRED agent-based modeling system.
{"title":"FRED Navigator: An interactive system for visualizing results from large-scale epidemic simulations","authors":"Jack Paparian, Shawn T. Brown, D. Burke, J. Grefenstette","doi":"10.1109/ESCIENCE.2012.6404444","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404444","url":null,"abstract":"Large-scale simulations are increasingly used to evaluate potential public health interventions in epidemics such as the H1N1 pandemic of 2009. Due to variations in both disease scenarios and in interventions, it is typical to run thousands of simulations as part of a given study. This paper addresses the challenge of visualizing the results from a large number of simulation runs. We describe a new tool called FRED Navigator that allows a user to interactively visualize results from the FRED agent-based modeling system.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"352 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75494213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404466
S. Narayanan, T. Madden, A. Sandy, R. Kettimuthu, M. Link
X-ray photon correlation spectroscopy (XPCS) is a unique tool to study the dynamical properties in a wide range of materials over a wide spatial and temporal range. XPCS measures the correlated changes in the speckle pattern, produced when a coherent x-ray beam is scattered from a disordered sample, over a time series of area detector images. The technique rides on “Big Data” and relies heavily on high performance computing (HPC) techniques. In this paper, we propose a highspeed data movement architecture for moving data within the Advanced Photon Source (APS) as well as between APS and the users' institutions. We describe the challenges involved in the internal data movement and a GridFTP-based solution that enables more efficient usage of the APS beam time. The implementation of GridFTP plugin as part of the data acquisition system at the Advanced Photon Source for real time data transfer to the HPC system for data analysis is discussed.
{"title":"GridFTP based real-time data movement architecture for x-ray photon correlation spectroscopy at the Advanced Photon Source","authors":"S. Narayanan, T. Madden, A. Sandy, R. Kettimuthu, M. Link","doi":"10.1109/eScience.2012.6404466","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404466","url":null,"abstract":"X-ray photon correlation spectroscopy (XPCS) is a unique tool to study the dynamical properties in a wide range of materials over a wide spatial and temporal range. XPCS measures the correlated changes in the speckle pattern, produced when a coherent x-ray beam is scattered from a disordered sample, over a time series of area detector images. The technique rides on “Big Data” and relies heavily on high performance computing (HPC) techniques. In this paper, we propose a highspeed data movement architecture for moving data within the Advanced Photon Source (APS) as well as between APS and the users' institutions. We describe the challenges involved in the internal data movement and a GridFTP-based solution that enables more efficient usage of the APS beam time. The implementation of GridFTP plugin as part of the data acquisition system at the Advanced Photon Source for real time data transfer to the HPC system for data analysis is discussed.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"19 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84807518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404448
R. Bartlett, M. Heroux, J. Willenbring
Software lifecycles are becoming an increasingly important issue for computational science & engineering (CSE) software. The process by which a piece of CSE software begins life as a set of research requirements and then matures into a trusted high-quality capability is both commonplace and extremely challenging. Although an implicit lifecycle is obviously being used in any effort, the challenges of this process-respecting the competing needs of research vs. production-cannot be overstated. Here we describe a proposal for a well-defined software life-cycle process based on modern Lean/Agile software engineering principles. What we propose is appropriate for many CSE software projects that are initially heavily focused on research but also are expected to eventually produce usable high-quality capabilities. The model is related to TriBITS, a build, integration and testing system, which serves as a strong foundation for this lifecycle model, and aspects of this lifecycle model are ingrained in the TriBITS system. Indeed this lifecycle process, if followed, will enable large-scale sustainable integration of many complex CSE software efforts across several institutions.
{"title":"Overview of the TriBITS lifecycle model: A Lean/Agile software lifecycle model for research-based computational science and engineering software","authors":"R. Bartlett, M. Heroux, J. Willenbring","doi":"10.1109/eScience.2012.6404448","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404448","url":null,"abstract":"Software lifecycles are becoming an increasingly important issue for computational science & engineering (CSE) software. The process by which a piece of CSE software begins life as a set of research requirements and then matures into a trusted high-quality capability is both commonplace and extremely challenging. Although an implicit lifecycle is obviously being used in any effort, the challenges of this process-respecting the competing needs of research vs. production-cannot be overstated. Here we describe a proposal for a well-defined software life-cycle process based on modern Lean/Agile software engineering principles. What we propose is appropriate for many CSE software projects that are initially heavily focused on research but also are expected to eventually produce usable high-quality capabilities. The model is related to TriBITS, a build, integration and testing system, which serves as a strong foundation for this lifecycle model, and aspects of this lifecycle model are ingrained in the TriBITS system. Indeed this lifecycle process, if followed, will enable large-scale sustainable integration of many complex CSE software efforts across several institutions.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"43 5 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88850335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/ESCIENCE.2012.6404464
D. Thompson, A. Khassapov, Y. Nesterets, T. Gureyev, John A. Taylor
Computed Tomography (CT) is a non-destructive imaging technique widely used across many scientific, industrial and medical fields. It is both computationally and data intensive, and therefore can benefit from infrastructure in the “supercomputing” domain for research purposes, such as Synchrotron science. Our group within CSIRO has been actively developing X-ray tomography and image processing software and systems for HPC clusters. We have also leveraged the use of GPU's (Graphical Processing Units) for several codes enabling speedups by an order of magnitude or more over CPU-only implementations. A key goal of our systems is to enable our targeted “end users”, researchers, easy access to the tools, computational resources and data via familiar interfaces and client applications such that specialized HPC expertise and support is generally not required in order to initiate and control data processing, analysis and visualzation workflows. We have strived to enable the use of HPC facilities in an interactive fashion, similar to the familiar Windows desktop environment, in contrast to the traditional batch-job oriented environment that is still the norm at most HPC installations. Several collaborations have been formed, and we currently have our systems deployed on two clusters within CSIRO, Australia. A major installation at the Australian Synchrotron (MASSIVE GPU cluster) where the system has been integrated with the Imaging and Medical Beamline (IMBL) detector to provide rapid on-demand CT-reconstruction and visualization capabilities to researchers whilst on-site and remotely. A smaller-scale installation has also been deployed on a mini-cluster at the Shanghai Synchrotron Radiation Facility (SSRF) in China. All clusters run the Windows HPC Server 2008 R2 operating system. The two large clusters running our software, MASSIVE and CSIRO Bragg are currently configured as “hybrid clusters” in which individual nodes can be dual-booted between Linux and Windows as demand requires. We have also recently explored the adaptation of our CT-reconstruction code to Cloud infrastructure, and have constructed a working “proof-of-concept” system for the Microsoft Azure Cloud. However, at this stage several challenges remain to be met in order to make it a truly viable alternative to our HPC cluster solution. Recently, CSIRO was successful in its proposal to develop eResearch tools for the Australian Government funded NeCTAR Research Cloud. As part of this project our group will be contributing CT and imaging processing components.
计算机断层扫描(CT)是一种无损成像技术,广泛应用于许多科学、工业和医学领域。它既是计算密集型的,也是数据密集型的,因此可以从“超级计算”领域的基础设施中受益,用于研究目的,比如同步加速器科学。我们在CSIRO的团队一直在积极开发用于高性能计算集群的x射线断层扫描和图像处理软件和系统。我们还利用GPU(图形处理单元)对几个代码的使用,使速度比仅使用cpu的实现提高一个数量级或更多。我们系统的一个关键目标是使我们的目标“最终用户”,研究人员,通过熟悉的界面和客户端应用程序轻松访问工具,计算资源和数据,这样就不需要专门的HPC专业知识和支持来启动和控制数据处理,分析和可视化工作流程。我们努力使HPC设施能够以一种交互的方式使用,类似于熟悉的Windows桌面环境,与传统的面向批处理作业的环境形成对比,后者仍然是大多数HPC安装的标准。已经形成了几个合作,我们目前已经将我们的系统部署在澳大利亚CSIRO的两个集群上。在澳大利亚同步加速器(MASSIVE GPU集群)的主要安装中,该系统与成像和医疗光束线(IMBL)探测器集成在一起,为现场和远程研究人员提供快速的按需ct重建和可视化功能。在中国上海同步辐射设施(SSRF)的一个小型集群上也部署了一个较小规模的装置。所有集群的操作系统均为Windows HPC Server 2008 R2。运行我们软件的两个大型集群MASSIVE和CSIRO Bragg目前被配置为“混合集群”,其中单个节点可以根据需要在Linux和Windows之间双启动。我们最近还探索了将ct重建代码适配到云基础设施上,并为微软Azure云构建了一个工作的“概念验证”系统。然而,在这个阶段,为了使它成为我们的HPC集群解决方案的真正可行的替代方案,仍然需要遇到一些挑战。最近,CSIRO成功地提出了为澳大利亚政府资助的NeCTAR研究云开发电子研究工具的建议。作为这个项目的一部分,我们小组将提供CT和成像处理组件。
{"title":"X-ray imaging software tools for HPC clusters and the Cloud","authors":"D. Thompson, A. Khassapov, Y. Nesterets, T. Gureyev, John A. Taylor","doi":"10.1109/ESCIENCE.2012.6404464","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404464","url":null,"abstract":"Computed Tomography (CT) is a non-destructive imaging technique widely used across many scientific, industrial and medical fields. It is both computationally and data intensive, and therefore can benefit from infrastructure in the “supercomputing” domain for research purposes, such as Synchrotron science. Our group within CSIRO has been actively developing X-ray tomography and image processing software and systems for HPC clusters. We have also leveraged the use of GPU's (Graphical Processing Units) for several codes enabling speedups by an order of magnitude or more over CPU-only implementations. A key goal of our systems is to enable our targeted “end users”, researchers, easy access to the tools, computational resources and data via familiar interfaces and client applications such that specialized HPC expertise and support is generally not required in order to initiate and control data processing, analysis and visualzation workflows. We have strived to enable the use of HPC facilities in an interactive fashion, similar to the familiar Windows desktop environment, in contrast to the traditional batch-job oriented environment that is still the norm at most HPC installations. Several collaborations have been formed, and we currently have our systems deployed on two clusters within CSIRO, Australia. A major installation at the Australian Synchrotron (MASSIVE GPU cluster) where the system has been integrated with the Imaging and Medical Beamline (IMBL) detector to provide rapid on-demand CT-reconstruction and visualization capabilities to researchers whilst on-site and remotely. A smaller-scale installation has also been deployed on a mini-cluster at the Shanghai Synchrotron Radiation Facility (SSRF) in China. All clusters run the Windows HPC Server 2008 R2 operating system. The two large clusters running our software, MASSIVE and CSIRO Bragg are currently configured as “hybrid clusters” in which individual nodes can be dual-booted between Linux and Windows as demand requires. We have also recently explored the adaptation of our CT-reconstruction code to Cloud infrastructure, and have constructed a working “proof-of-concept” system for the Microsoft Azure Cloud. However, at this stage several challenges remain to be met in order to make it a truly viable alternative to our HPC cluster solution. Recently, CSIRO was successful in its proposal to develop eResearch tools for the Australian Government funded NeCTAR Research Cloud. As part of this project our group will be contributing CT and imaging processing components.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"7 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80449474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404450
A. D. Meglio, F. Estrella, M. Riedel
In September 2011 the European Middleware Initiative (EMI) started discussing the feasibility of creating an open source community for science with other projects like EGI, StratusLab, OpenAIRE, iMarine, and IGE, SMEs like DCore, Maat, SixSq, SharedObjects, communities like WLCG and LSGC. The general idea of establishing an open source community dedicated to software for scientific applications was understood and appreciated by most people. However, the lack of a precise definition of goals and scope is a limiting factor that has also made many people sceptical of the initiative. In order to understand more precisely what such an open source initiative should do and how, EMI has started a more formal feasibility study around a concept called ScienceSoft - Open Software for Open Science. A group of people from interested parties was created in December 2011 to be the ScienceSoft Steering Committee with the short-term mandate to formalize the discussions about the initiative and produce a document with an initial high-level description of the motivations, issues and possible solutions and a general plan to make it happen. The conclusions of the initial investigation were presented at CERN in February 2012 at a ScienceSoft Workshop organized by EMI. Since then, presentations of ScienceSoft have been made in various occasions, in Amsterdam in January 2012 at the EGI Workshop on Sustainability, in Taipei in February at the ISGC 2012 conference, in Munich in March at the EGI/EMI Conference and at OGF 34 in March. This paper provides information this concept study ScienceSoft as an overview distributed to the broader scientific community to critique it.
{"title":"On realizing the concept study ScienceSoft of the European Middleware Initiative: Open Software for Open Science","authors":"A. D. Meglio, F. Estrella, M. Riedel","doi":"10.1109/eScience.2012.6404450","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404450","url":null,"abstract":"In September 2011 the European Middleware Initiative (EMI) started discussing the feasibility of creating an open source community for science with other projects like EGI, StratusLab, OpenAIRE, iMarine, and IGE, SMEs like DCore, Maat, SixSq, SharedObjects, communities like WLCG and LSGC. The general idea of establishing an open source community dedicated to software for scientific applications was understood and appreciated by most people. However, the lack of a precise definition of goals and scope is a limiting factor that has also made many people sceptical of the initiative. In order to understand more precisely what such an open source initiative should do and how, EMI has started a more formal feasibility study around a concept called ScienceSoft - Open Software for Open Science. A group of people from interested parties was created in December 2011 to be the ScienceSoft Steering Committee with the short-term mandate to formalize the discussions about the initiative and produce a document with an initial high-level description of the motivations, issues and possible solutions and a general plan to make it happen. The conclusions of the initial investigation were presented at CERN in February 2012 at a ScienceSoft Workshop organized by EMI. Since then, presentations of ScienceSoft have been made in various occasions, in Amsterdam in January 2012 at the EGI Workshop on Sustainability, in Taipei in February at the ISGC 2012 conference, in Munich in March at the EGI/EMI Conference and at OGF 34 in March. This paper provides information this concept study ScienceSoft as an overview distributed to the broader scientific community to critique it.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"30 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74068591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2012-10-08DOI: 10.1109/eScience.2012.6404480
Y. Cheah, Beth Plale
Data provenance, a key piece of metadata that describes the lifecycle of a data product, is crucial in aiding scientists to better understand and facilitate reproducibility and reuse of scientific results. Provenance collection systems often capture provenance on the fly and the protocol between application and provenance tool may not be reliable. As a result, data provenance can become ambiguous or simply inaccurate. In this paper, we identify likely quality issues in data provenance. We also establish crucial quality dimensions that are especially critical for the evaluation of provenance quality. We analyze synthetic and real-world provenance based on these quality dimensions and summarize our contributions to provenance quality.
{"title":"Provenance analysis: Towards quality provenance","authors":"Y. Cheah, Beth Plale","doi":"10.1109/eScience.2012.6404480","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404480","url":null,"abstract":"Data provenance, a key piece of metadata that describes the lifecycle of a data product, is crucial in aiding scientists to better understand and facilitate reproducibility and reuse of scientific results. Provenance collection systems often capture provenance on the fly and the protocol between application and provenance tool may not be reliable. As a result, data provenance can become ambiguous or simply inaccurate. In this paper, we identify likely quality issues in data provenance. We also establish crucial quality dimensions that are especially critical for the evaluation of provenance quality. We analyze synthetic and real-world provenance based on these quality dimensions and summarize our contributions to provenance quality.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"81 5","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72607827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}