首页 > 最新文献

2012 IEEE 8th International Conference on E-Science最新文献

英文 中文
High-performance data management for genome sequencing centers using Globus Online: A case study 使用Globus Online的基因组测序中心的高性能数据管理:一个案例研究
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404443
Dinanath Sulakhe, R. Kettimuthu, Utpal J. Davé
In the past few years in the biomedical field, availability of low-cost sequencing methods in the form of next-generation sequencing has revolutionized the approaches life science researchers are undertaking in order to gain a better understanding of the causative factors of diseases. With biomedical researchers getting many of their patients' DNA and RNA sequenced, sequencing centers are working with hundreds of researchers with terabytes to petabytes of data for each researcher. The unprecedented scale at which genomic sequence data is generated today by high-throughput technologies requires sophisticated and high-performance methods of data handling and management. For the most part, however, the state of the art is to use hard disks to ship the data. As data volumes reach tens or even hundreds of terabytes, such approaches become increasingly impractical. Data stored on portable media can be easily lost, and typically is not readily accessible to all members of the collaboration. In this paper, we discuss the application of Globus Online within a sequencing facility to address the data movement and management challenges that arise as a result of exponentially increasing amount of data being generated by a rapidly growing number of research groups. We also present the unique challenges in applying a Globus Online solution in sequencing center environments and how we overcome those challenges.
在过去的几年里,在生物医学领域,以下一代测序形式出现的低成本测序方法已经彻底改变了生命科学研究人员为了更好地了解疾病的致病因素而采取的方法。随着生物医学研究人员对许多患者的DNA和RNA进行测序,测序中心正在与数百名研究人员合作,为每位研究人员提供tb到pb的数据。如今,高通量技术产生的基因组序列数据规模空前,这需要复杂和高性能的数据处理和管理方法。然而,在大多数情况下,最先进的技术是使用硬盘来传输数据。随着数据量达到数十甚至数百tb,这种方法变得越来越不切实际。存储在便携式媒体上的数据很容易丢失,并且通常不是协作的所有成员都可以轻松访问。在本文中,我们讨论了Globus Online在测序设施中的应用,以解决由于快速增长的研究小组产生的数据量呈指数级增长而产生的数据移动和管理挑战。我们还介绍了在测序中心环境中应用Globus Online解决方案的独特挑战以及我们如何克服这些挑战。
{"title":"High-performance data management for genome sequencing centers using Globus Online: A case study","authors":"Dinanath Sulakhe, R. Kettimuthu, Utpal J. Davé","doi":"10.1109/eScience.2012.6404443","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404443","url":null,"abstract":"In the past few years in the biomedical field, availability of low-cost sequencing methods in the form of next-generation sequencing has revolutionized the approaches life science researchers are undertaking in order to gain a better understanding of the causative factors of diseases. With biomedical researchers getting many of their patients' DNA and RNA sequenced, sequencing centers are working with hundreds of researchers with terabytes to petabytes of data for each researcher. The unprecedented scale at which genomic sequence data is generated today by high-throughput technologies requires sophisticated and high-performance methods of data handling and management. For the most part, however, the state of the art is to use hard disks to ship the data. As data volumes reach tens or even hundreds of terabytes, such approaches become increasingly impractical. Data stored on portable media can be easily lost, and typically is not readily accessible to all members of the collaboration. In this paper, we discuss the application of Globus Online within a sequencing facility to address the data movement and management challenges that arise as a result of exponentially increasing amount of data being generated by a rapidly growing number of research groups. We also present the unique challenges in applying a Globus Online solution in sequencing center environments and how we overcome those challenges.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"127 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90263422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
From scripts towards provenance inference 从文字到出处推断
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404467
M. R. Huq, P. Apers, A. Wombacher, Y. Wada, L. V. Beek
Scientists require provenance information either to validate their model or to investigate the origin of an unexpected value. However, they do not maintain any provenance information and even designing the processing workflow is rare in practice. Therefore, in this paper, we propose a solution that can build the workflow provenance graph by interpreting the scripts used for actual processing. Further, scientists can request fine-grained provenance information facilitating the inferred workflow provenance. We also provide a guideline to customize the workflow provenance graph based on user preferences. Our evaluation shows that the proposed approach is relevant and suitable for scientists to manage provenance.
科学家需要来源信息来验证他们的模型,或者调查一个意想不到的值的起源。然而,它们不保留任何来源信息,甚至设计处理工作流在实践中也很少见。因此,在本文中,我们提出了一种解决方案,可以通过解释用于实际处理的脚本来构建工作流来源图。此外,科学家可以要求细粒度的来源信息,以促进推断的工作流来源。我们还提供了一个指南来根据用户偏好定制工作流来源图。我们的评估表明,提出的方法是相关的,适合科学家管理来源。
{"title":"From scripts towards provenance inference","authors":"M. R. Huq, P. Apers, A. Wombacher, Y. Wada, L. V. Beek","doi":"10.1109/ESCIENCE.2012.6404467","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404467","url":null,"abstract":"Scientists require provenance information either to validate their model or to investigate the origin of an unexpected value. However, they do not maintain any provenance information and even designing the processing workflow is rare in practice. Therefore, in this paper, we propose a solution that can build the workflow provenance graph by interpreting the scripts used for actual processing. Further, scientists can request fine-grained provenance information facilitating the inferred workflow provenance. We also provide a guideline to customize the workflow provenance graph based on user preferences. Our evaluation shows that the proposed approach is relevant and suitable for scientists to manage provenance.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"13 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83650847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Remote phenology: Applying machine learning to detect phenological patterns in a cerrado savanna 远程物候学:应用机器学习来检测塞拉多稀树草原的物候模式
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404438
J. Almeida, J. A. D. Santos, Bruna Alberton, R. Torres, L. Morellato
Plant phenology has gained importance in the context of global change research, stimulating the development of new technologies for phenological observation. Digital cameras have been successfully used as multi-channel imaging sensors, providing measures of leaf color change information (RGB channels), or leafing phenological changes in plants. We monitored leaf-changing patterns of a cerrado-savanna vegetation by taken daily digital images. We extract RGB channels from digital images and correlated with phenological changes. Our first goals were: (1) to test if the color change information is able to characterize the phenological pattern of a group of species; and (2) to test if individuals from the same functional group may be automatically identified using digital images. In this paper, we present a machine learning approach to detect phenological patterns in the digital images. Our preliminary results indicate that: (1) extreme hours (morning and afternoon) are the best for identifying plant species; and (2) different plant species present a different behavior with respect to the color change information. Based on those results, we suggest that individuals from the same functional group might be identified using digital images, and introduce a new tool to help phenology experts in the species identification and location on-the-ground.
植物物候学在全球变化研究中越来越重要,促进了物候观测新技术的发展。数码相机已经成功地用作多通道成像传感器,提供植物叶片颜色变化信息(RGB通道)或叶片物候变化的测量。我们通过每天拍摄数字图像来监测塞拉多稀树草原植被的叶子变化模式。我们从数字图像中提取RGB通道,并与物候变化相关联。我们的第一个目标是:(1)测试颜色变化信息是否能够表征一组物种的物候模式;(2)测试是否可以使用数字图像自动识别来自同一功能群的个体。在本文中,我们提出了一种机器学习方法来检测数字图像中的物候模式。初步结果表明:(1)极端时段(上午和下午)是识别植物种类的最佳时段;(2)不同植物种类对颜色变化信息表现出不同的行为。在此基础上,我们建议利用数字图像识别同一功能群的个体,并引入一种新的工具来帮助物候学专家在实地进行物种识别和定位。
{"title":"Remote phenology: Applying machine learning to detect phenological patterns in a cerrado savanna","authors":"J. Almeida, J. A. D. Santos, Bruna Alberton, R. Torres, L. Morellato","doi":"10.1109/ESCIENCE.2012.6404438","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404438","url":null,"abstract":"Plant phenology has gained importance in the context of global change research, stimulating the development of new technologies for phenological observation. Digital cameras have been successfully used as multi-channel imaging sensors, providing measures of leaf color change information (RGB channels), or leafing phenological changes in plants. We monitored leaf-changing patterns of a cerrado-savanna vegetation by taken daily digital images. We extract RGB channels from digital images and correlated with phenological changes. Our first goals were: (1) to test if the color change information is able to characterize the phenological pattern of a group of species; and (2) to test if individuals from the same functional group may be automatically identified using digital images. In this paper, we present a machine learning approach to detect phenological patterns in the digital images. Our preliminary results indicate that: (1) extreme hours (morning and afternoon) are the best for identifying plant species; and (2) different plant species present a different behavior with respect to the color change information. Based on those results, we suggest that individuals from the same functional group might be identified using digital images, and introduce a new tool to help phenology experts in the species identification and location on-the-ground.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"226 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83612923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Fast confidential search for bio-medical data using Bloom filters and Homomorphic Cryptography 快速机密搜索生物医学数据使用布隆过滤器和同态密码学
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404484
H. Perl, Yassene Mohammed, Michael Brenner, Matthew Smith
Data protection is a challenge when outsourcing medical analysis, especially if one is dealing with patient related data. While securing transfer channels is possible using encryption mechanisms, protecting the data during analyses is difficult as it usually involves processing steps on the plain data. A common use case in bioinformatics is when a scientist searches for a biological sequence of amino acids or DNA nucleotides in a library or database of sequences to identify similarities. Most such search algorithms are optimized for speed with less or no consideration for data protection. Fast algorithms are especially necessary because of the immense search space represented for instance by the genome or proteome of complex organisms. We propose a new secure exact term search algorithm based on Bloom filters. Our algorithm retains data privacy by using Obfuscated Bloom filters while maintaining the performance needed for real-life applications. The results can then be further aggregated using Homomorphic Cryptography to allow exact-match searching. The proposed system facilitates outsourcing exact term search of sensitive data to on-demand resources in a way which conforms to best practice of data protection.
在外包医疗分析时,数据保护是一个挑战,特别是在处理与患者相关的数据时。虽然可以使用加密机制保护传输通道,但在分析期间保护数据是困难的,因为它通常涉及对普通数据的处理步骤。生物信息学中的一个常见用例是当科学家在序列库或数据库中搜索氨基酸或DNA核苷酸的生物序列以识别相似性时。大多数这样的搜索算法都是为了提高速度而优化的,很少或根本没有考虑数据保护。由于复杂生物体的基因组或蛋白质组所代表的巨大搜索空间,快速算法尤其必要。提出了一种基于Bloom过滤器的安全精确词搜索算法。我们的算法通过使用模糊的Bloom过滤器来保留数据隐私,同时保持实际应用所需的性能。然后可以使用同态加密进一步聚合结果,以允许精确匹配搜索。建议的系统以符合资料保护最佳实务的方式,方便将敏感资料的准确词项搜寻外判给按需资源。
{"title":"Fast confidential search for bio-medical data using Bloom filters and Homomorphic Cryptography","authors":"H. Perl, Yassene Mohammed, Michael Brenner, Matthew Smith","doi":"10.1109/eScience.2012.6404484","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404484","url":null,"abstract":"Data protection is a challenge when outsourcing medical analysis, especially if one is dealing with patient related data. While securing transfer channels is possible using encryption mechanisms, protecting the data during analyses is difficult as it usually involves processing steps on the plain data. A common use case in bioinformatics is when a scientist searches for a biological sequence of amino acids or DNA nucleotides in a library or database of sequences to identify similarities. Most such search algorithms are optimized for speed with less or no consideration for data protection. Fast algorithms are especially necessary because of the immense search space represented for instance by the genome or proteome of complex organisms. We propose a new secure exact term search algorithm based on Bloom filters. Our algorithm retains data privacy by using Obfuscated Bloom filters while maintaining the performance needed for real-life applications. The results can then be further aggregated using Homomorphic Cryptography to allow exact-match searching. The proposed system facilitates outsourcing exact term search of sensitive data to on-demand resources in a way which conforms to best practice of data protection.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"20 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73072583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
FRED Navigator: An interactive system for visualizing results from large-scale epidemic simulations FRED Navigator:用于可视化大规模流行病模拟结果的交互式系统
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404444
Jack Paparian, Shawn T. Brown, D. Burke, J. Grefenstette
Large-scale simulations are increasingly used to evaluate potential public health interventions in epidemics such as the H1N1 pandemic of 2009. Due to variations in both disease scenarios and in interventions, it is typical to run thousands of simulations as part of a given study. This paper addresses the challenge of visualizing the results from a large number of simulation runs. We describe a new tool called FRED Navigator that allows a user to interactively visualize results from the FRED agent-based modeling system.
大规模模拟越来越多地用于评估2009年H1N1大流行等流行病中潜在的公共卫生干预措施。由于疾病情况和干预措施的差异,通常在给定的研究中进行数千次模拟。本文解决了将大量模拟运行的结果可视化的挑战。我们描述了一个名为FRED Navigator的新工具,它允许用户交互式地可视化FRED基于代理的建模系统的结果。
{"title":"FRED Navigator: An interactive system for visualizing results from large-scale epidemic simulations","authors":"Jack Paparian, Shawn T. Brown, D. Burke, J. Grefenstette","doi":"10.1109/ESCIENCE.2012.6404444","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404444","url":null,"abstract":"Large-scale simulations are increasingly used to evaluate potential public health interventions in epidemics such as the H1N1 pandemic of 2009. Due to variations in both disease scenarios and in interventions, it is typical to run thousands of simulations as part of a given study. This paper addresses the challenge of visualizing the results from a large number of simulation runs. We describe a new tool called FRED Navigator that allows a user to interactively visualize results from the FRED agent-based modeling system.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"352 1","pages":"1-5"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75494213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
GridFTP based real-time data movement architecture for x-ray photon correlation spectroscopy at the Advanced Photon Source 基于GridFTP的先进光子源x射线光子相关光谱实时数据移动架构
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404466
S. Narayanan, T. Madden, A. Sandy, R. Kettimuthu, M. Link
X-ray photon correlation spectroscopy (XPCS) is a unique tool to study the dynamical properties in a wide range of materials over a wide spatial and temporal range. XPCS measures the correlated changes in the speckle pattern, produced when a coherent x-ray beam is scattered from a disordered sample, over a time series of area detector images. The technique rides on “Big Data” and relies heavily on high performance computing (HPC) techniques. In this paper, we propose a highspeed data movement architecture for moving data within the Advanced Photon Source (APS) as well as between APS and the users' institutions. We describe the challenges involved in the internal data movement and a GridFTP-based solution that enables more efficient usage of the APS beam time. The implementation of GridFTP plugin as part of the data acquisition system at the Advanced Photon Source for real time data transfer to the HPC system for data analysis is discussed.
x射线光子相关光谱(XPCS)是一种独特的工具,可以在广泛的空间和时间范围内研究各种材料的动力学性质。XPCS测量散斑模式的相关变化,当相干x射线束从无序样品散射时,在区域探测器图像的时间序列上产生。这项技术依赖于“大数据”,并在很大程度上依赖于高性能计算技术。在本文中,我们提出了一种高速数据移动架构,用于在高级光子源(APS)内部以及APS与用户机构之间移动数据。我们描述了内部数据移动和基于gridftp的解决方案所涉及的挑战,该解决方案可以更有效地利用APS波束时间。讨论了GridFTP插件作为高级光子源数据采集系统的一部分的实现,用于将数据实时传输到高性能计算系统进行数据分析。
{"title":"GridFTP based real-time data movement architecture for x-ray photon correlation spectroscopy at the Advanced Photon Source","authors":"S. Narayanan, T. Madden, A. Sandy, R. Kettimuthu, M. Link","doi":"10.1109/eScience.2012.6404466","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404466","url":null,"abstract":"X-ray photon correlation spectroscopy (XPCS) is a unique tool to study the dynamical properties in a wide range of materials over a wide spatial and temporal range. XPCS measures the correlated changes in the speckle pattern, produced when a coherent x-ray beam is scattered from a disordered sample, over a time series of area detector images. The technique rides on “Big Data” and relies heavily on high performance computing (HPC) techniques. In this paper, we propose a highspeed data movement architecture for moving data within the Advanced Photon Source (APS) as well as between APS and the users' institutions. We describe the challenges involved in the internal data movement and a GridFTP-based solution that enables more efficient usage of the APS beam time. The implementation of GridFTP plugin as part of the data acquisition system at the Advanced Photon Source for real time data transfer to the HPC system for data analysis is discussed.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"19 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84807518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Overview of the TriBITS lifecycle model: A Lean/Agile software lifecycle model for research-based computational science and engineering software TriBITS生命周期模型概述:基于研究的计算科学和工程软件的精益/敏捷软件生命周期模型
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404448
R. Bartlett, M. Heroux, J. Willenbring
Software lifecycles are becoming an increasingly important issue for computational science & engineering (CSE) software. The process by which a piece of CSE software begins life as a set of research requirements and then matures into a trusted high-quality capability is both commonplace and extremely challenging. Although an implicit lifecycle is obviously being used in any effort, the challenges of this process-respecting the competing needs of research vs. production-cannot be overstated. Here we describe a proposal for a well-defined software life-cycle process based on modern Lean/Agile software engineering principles. What we propose is appropriate for many CSE software projects that are initially heavily focused on research but also are expected to eventually produce usable high-quality capabilities. The model is related to TriBITS, a build, integration and testing system, which serves as a strong foundation for this lifecycle model, and aspects of this lifecycle model are ingrained in the TriBITS system. Indeed this lifecycle process, if followed, will enable large-scale sustainable integration of many complex CSE software efforts across several institutions.
软件生命周期正成为计算科学与工程(CSE)软件日益重要的问题。CSE软件从一组研究需求开始,然后成熟为可信赖的高质量功能,这一过程既普通又极具挑战性。尽管一个隐含的生命周期显然在任何努力中都被使用,但这个过程的挑战——尊重研究与生产的竞争需求——怎么强调也不过分。在这里,我们描述了一个基于现代精益/敏捷软件工程原则的定义良好的软件生命周期过程的建议。我们的建议适用于许多CSE软件项目,这些项目最初主要集中在研究上,但也期望最终产生可用的高质量功能。该模型与TriBITS相关,TriBITS是一个构建、集成和测试系统,它作为该生命周期模型的坚实基础,并且该生命周期模型的各个方面在TriBITS系统中根深蒂固。实际上,如果遵循这个生命周期过程,将使跨几个机构的许多复杂CSE软件工作的大规模可持续集成成为可能。
{"title":"Overview of the TriBITS lifecycle model: A Lean/Agile software lifecycle model for research-based computational science and engineering software","authors":"R. Bartlett, M. Heroux, J. Willenbring","doi":"10.1109/eScience.2012.6404448","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404448","url":null,"abstract":"Software lifecycles are becoming an increasingly important issue for computational science & engineering (CSE) software. The process by which a piece of CSE software begins life as a set of research requirements and then matures into a trusted high-quality capability is both commonplace and extremely challenging. Although an implicit lifecycle is obviously being used in any effort, the challenges of this process-respecting the competing needs of research vs. production-cannot be overstated. Here we describe a proposal for a well-defined software life-cycle process based on modern Lean/Agile software engineering principles. What we propose is appropriate for many CSE software projects that are initially heavily focused on research but also are expected to eventually produce usable high-quality capabilities. The model is related to TriBITS, a build, integration and testing system, which serves as a strong foundation for this lifecycle model, and aspects of this lifecycle model are ingrained in the TriBITS system. Indeed this lifecycle process, if followed, will enable large-scale sustainable integration of many complex CSE software efforts across several institutions.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"43 5 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88850335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
X-ray imaging software tools for HPC clusters and the Cloud 用于高性能计算集群和云的x射线成像软件工具
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404464
D. Thompson, A. Khassapov, Y. Nesterets, T. Gureyev, John A. Taylor
Computed Tomography (CT) is a non-destructive imaging technique widely used across many scientific, industrial and medical fields. It is both computationally and data intensive, and therefore can benefit from infrastructure in the “supercomputing” domain for research purposes, such as Synchrotron science. Our group within CSIRO has been actively developing X-ray tomography and image processing software and systems for HPC clusters. We have also leveraged the use of GPU's (Graphical Processing Units) for several codes enabling speedups by an order of magnitude or more over CPU-only implementations. A key goal of our systems is to enable our targeted “end users”, researchers, easy access to the tools, computational resources and data via familiar interfaces and client applications such that specialized HPC expertise and support is generally not required in order to initiate and control data processing, analysis and visualzation workflows. We have strived to enable the use of HPC facilities in an interactive fashion, similar to the familiar Windows desktop environment, in contrast to the traditional batch-job oriented environment that is still the norm at most HPC installations. Several collaborations have been formed, and we currently have our systems deployed on two clusters within CSIRO, Australia. A major installation at the Australian Synchrotron (MASSIVE GPU cluster) where the system has been integrated with the Imaging and Medical Beamline (IMBL) detector to provide rapid on-demand CT-reconstruction and visualization capabilities to researchers whilst on-site and remotely. A smaller-scale installation has also been deployed on a mini-cluster at the Shanghai Synchrotron Radiation Facility (SSRF) in China. All clusters run the Windows HPC Server 2008 R2 operating system. The two large clusters running our software, MASSIVE and CSIRO Bragg are currently configured as “hybrid clusters” in which individual nodes can be dual-booted between Linux and Windows as demand requires. We have also recently explored the adaptation of our CT-reconstruction code to Cloud infrastructure, and have constructed a working “proof-of-concept” system for the Microsoft Azure Cloud. However, at this stage several challenges remain to be met in order to make it a truly viable alternative to our HPC cluster solution. Recently, CSIRO was successful in its proposal to develop eResearch tools for the Australian Government funded NeCTAR Research Cloud. As part of this project our group will be contributing CT and imaging processing components.
计算机断层扫描(CT)是一种无损成像技术,广泛应用于许多科学、工业和医学领域。它既是计算密集型的,也是数据密集型的,因此可以从“超级计算”领域的基础设施中受益,用于研究目的,比如同步加速器科学。我们在CSIRO的团队一直在积极开发用于高性能计算集群的x射线断层扫描和图像处理软件和系统。我们还利用GPU(图形处理单元)对几个代码的使用,使速度比仅使用cpu的实现提高一个数量级或更多。我们系统的一个关键目标是使我们的目标“最终用户”,研究人员,通过熟悉的界面和客户端应用程序轻松访问工具,计算资源和数据,这样就不需要专门的HPC专业知识和支持来启动和控制数据处理,分析和可视化工作流程。我们努力使HPC设施能够以一种交互的方式使用,类似于熟悉的Windows桌面环境,与传统的面向批处理作业的环境形成对比,后者仍然是大多数HPC安装的标准。已经形成了几个合作,我们目前已经将我们的系统部署在澳大利亚CSIRO的两个集群上。在澳大利亚同步加速器(MASSIVE GPU集群)的主要安装中,该系统与成像和医疗光束线(IMBL)探测器集成在一起,为现场和远程研究人员提供快速的按需ct重建和可视化功能。在中国上海同步辐射设施(SSRF)的一个小型集群上也部署了一个较小规模的装置。所有集群的操作系统均为Windows HPC Server 2008 R2。运行我们软件的两个大型集群MASSIVE和CSIRO Bragg目前被配置为“混合集群”,其中单个节点可以根据需要在Linux和Windows之间双启动。我们最近还探索了将ct重建代码适配到云基础设施上,并为微软Azure云构建了一个工作的“概念验证”系统。然而,在这个阶段,为了使它成为我们的HPC集群解决方案的真正可行的替代方案,仍然需要遇到一些挑战。最近,CSIRO成功地提出了为澳大利亚政府资助的NeCTAR研究云开发电子研究工具的建议。作为这个项目的一部分,我们小组将提供CT和成像处理组件。
{"title":"X-ray imaging software tools for HPC clusters and the Cloud","authors":"D. Thompson, A. Khassapov, Y. Nesterets, T. Gureyev, John A. Taylor","doi":"10.1109/ESCIENCE.2012.6404464","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404464","url":null,"abstract":"Computed Tomography (CT) is a non-destructive imaging technique widely used across many scientific, industrial and medical fields. It is both computationally and data intensive, and therefore can benefit from infrastructure in the “supercomputing” domain for research purposes, such as Synchrotron science. Our group within CSIRO has been actively developing X-ray tomography and image processing software and systems for HPC clusters. We have also leveraged the use of GPU's (Graphical Processing Units) for several codes enabling speedups by an order of magnitude or more over CPU-only implementations. A key goal of our systems is to enable our targeted “end users”, researchers, easy access to the tools, computational resources and data via familiar interfaces and client applications such that specialized HPC expertise and support is generally not required in order to initiate and control data processing, analysis and visualzation workflows. We have strived to enable the use of HPC facilities in an interactive fashion, similar to the familiar Windows desktop environment, in contrast to the traditional batch-job oriented environment that is still the norm at most HPC installations. Several collaborations have been formed, and we currently have our systems deployed on two clusters within CSIRO, Australia. A major installation at the Australian Synchrotron (MASSIVE GPU cluster) where the system has been integrated with the Imaging and Medical Beamline (IMBL) detector to provide rapid on-demand CT-reconstruction and visualization capabilities to researchers whilst on-site and remotely. A smaller-scale installation has also been deployed on a mini-cluster at the Shanghai Synchrotron Radiation Facility (SSRF) in China. All clusters run the Windows HPC Server 2008 R2 operating system. The two large clusters running our software, MASSIVE and CSIRO Bragg are currently configured as “hybrid clusters” in which individual nodes can be dual-booted between Linux and Windows as demand requires. We have also recently explored the adaptation of our CT-reconstruction code to Cloud infrastructure, and have constructed a working “proof-of-concept” system for the Microsoft Azure Cloud. However, at this stage several challenges remain to be met in order to make it a truly viable alternative to our HPC cluster solution. Recently, CSIRO was successful in its proposal to develop eResearch tools for the Australian Government funded NeCTAR Research Cloud. As part of this project our group will be contributing CT and imaging processing components.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"7 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80449474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
On realizing the concept study ScienceSoft of the European Middleware Initiative: Open Software for Open Science 关于实现欧洲中间件计划的概念研究ScienceSoft:开放科学的开放软件
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404450
A. D. Meglio, F. Estrella, M. Riedel
In September 2011 the European Middleware Initiative (EMI) started discussing the feasibility of creating an open source community for science with other projects like EGI, StratusLab, OpenAIRE, iMarine, and IGE, SMEs like DCore, Maat, SixSq, SharedObjects, communities like WLCG and LSGC. The general idea of establishing an open source community dedicated to software for scientific applications was understood and appreciated by most people. However, the lack of a precise definition of goals and scope is a limiting factor that has also made many people sceptical of the initiative. In order to understand more precisely what such an open source initiative should do and how, EMI has started a more formal feasibility study around a concept called ScienceSoft - Open Software for Open Science. A group of people from interested parties was created in December 2011 to be the ScienceSoft Steering Committee with the short-term mandate to formalize the discussions about the initiative and produce a document with an initial high-level description of the motivations, issues and possible solutions and a general plan to make it happen. The conclusions of the initial investigation were presented at CERN in February 2012 at a ScienceSoft Workshop organized by EMI. Since then, presentations of ScienceSoft have been made in various occasions, in Amsterdam in January 2012 at the EGI Workshop on Sustainability, in Taipei in February at the ISGC 2012 conference, in Munich in March at the EGI/EMI Conference and at OGF 34 in March. This paper provides information this concept study ScienceSoft as an overview distributed to the broader scientific community to critique it.
2011年9月,欧洲中间件倡议(EMI)开始讨论与其他项目(如EGI、StratusLab、OpenAIRE、iMarine和IGE)、中小企业(如DCore、Maat、SixSq、SharedObjects)、社区(如WLCG和LSGC)一起创建一个科学开源社区的可行性。建立一个致力于科学应用软件的开放源码社区的总体思想被大多数人理解和赞赏。然而,缺乏对目标和范围的精确定义是一个限制因素,这也使许多人对该倡议持怀疑态度。为了更准确地理解这样一个开源计划应该做什么以及如何做,EMI已经围绕一个名为ScienceSoft(开放科学的开放软件)的概念开始了一项更正式的可行性研究。2011年12月,来自相关各方的一群人成立了ScienceSoft指导委员会,其短期任务是将有关该计划的讨论正式化,并生成一份文件,其中包含对动机、问题和可能解决方案的初步高层描述,以及实现该计划的总体计划。初步调查的结论于2012年2月在欧洲核子研究中心由EMI组织的ScienceSoft研讨会上发表。从那时起,ScienceSoft在各种场合做了演讲,2012年1月在阿姆斯特丹的EGI可持续发展研讨会上,2月在台北的ISGC 2012会议上,3月在慕尼黑的EGI/EMI会议上,3月在OGF 34上。本文提供的信息,这一概念研究ScienceSoft作为概述分发到更广泛的科学界批评它。
{"title":"On realizing the concept study ScienceSoft of the European Middleware Initiative: Open Software for Open Science","authors":"A. D. Meglio, F. Estrella, M. Riedel","doi":"10.1109/eScience.2012.6404450","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404450","url":null,"abstract":"In September 2011 the European Middleware Initiative (EMI) started discussing the feasibility of creating an open source community for science with other projects like EGI, StratusLab, OpenAIRE, iMarine, and IGE, SMEs like DCore, Maat, SixSq, SharedObjects, communities like WLCG and LSGC. The general idea of establishing an open source community dedicated to software for scientific applications was understood and appreciated by most people. However, the lack of a precise definition of goals and scope is a limiting factor that has also made many people sceptical of the initiative. In order to understand more precisely what such an open source initiative should do and how, EMI has started a more formal feasibility study around a concept called ScienceSoft - Open Software for Open Science. A group of people from interested parties was created in December 2011 to be the ScienceSoft Steering Committee with the short-term mandate to formalize the discussions about the initiative and produce a document with an initial high-level description of the motivations, issues and possible solutions and a general plan to make it happen. The conclusions of the initial investigation were presented at CERN in February 2012 at a ScienceSoft Workshop organized by EMI. Since then, presentations of ScienceSoft have been made in various occasions, in Amsterdam in January 2012 at the EGI Workshop on Sustainability, in Taipei in February at the ISGC 2012 conference, in Munich in March at the EGI/EMI Conference and at OGF 34 in March. This paper provides information this concept study ScienceSoft as an overview distributed to the broader scientific community to critique it.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"30 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74068591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Provenance analysis: Towards quality provenance 种源分析:走向优质种源
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404480
Y. Cheah, Beth Plale
Data provenance, a key piece of metadata that describes the lifecycle of a data product, is crucial in aiding scientists to better understand and facilitate reproducibility and reuse of scientific results. Provenance collection systems often capture provenance on the fly and the protocol between application and provenance tool may not be reliable. As a result, data provenance can become ambiguous or simply inaccurate. In this paper, we identify likely quality issues in data provenance. We also establish crucial quality dimensions that are especially critical for the evaluation of provenance quality. We analyze synthetic and real-world provenance based on these quality dimensions and summarize our contributions to provenance quality.
数据来源是描述数据产品生命周期的元数据的一个关键部分,对于帮助科学家更好地理解和促进科学结果的可再现性和重用至关重要。来源收集系统经常动态捕获来源,应用程序和来源工具之间的协议可能不可靠。因此,数据来源可能变得含糊不清或根本不准确。在本文中,我们确定了数据来源中可能存在的质量问题。我们还建立了关键的质量维度,这对评估来源质量尤其重要。我们基于这些质量维度分析了合成种源和真实种源,并总结了我们对种源质量的贡献。
{"title":"Provenance analysis: Towards quality provenance","authors":"Y. Cheah, Beth Plale","doi":"10.1109/eScience.2012.6404480","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404480","url":null,"abstract":"Data provenance, a key piece of metadata that describes the lifecycle of a data product, is crucial in aiding scientists to better understand and facilitate reproducibility and reuse of scientific results. Provenance collection systems often capture provenance on the fly and the protocol between application and provenance tool may not be reliable. As a result, data provenance can become ambiguous or simply inaccurate. In this paper, we identify likely quality issues in data provenance. We also establish crucial quality dimensions that are especially critical for the evaluation of provenance quality. We analyze synthetic and real-world provenance based on these quality dimensions and summarize our contributions to provenance quality.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"81 5","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72607827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
期刊
2012 IEEE 8th International Conference on E-Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1