首页 > 最新文献

2009 Fifth IEEE International Conference on e-Science最新文献

英文 中文
A Virtual Connectivity Layer for Grids 网格的虚拟连接层
Pub Date : 2009-12-09 DOI: 10.1109/e-Science.2009.50
Jefferson Tan, D. Abramson, C. Enticott
Computational grids are now mainstream facilities for e-research worldwide. While enterprise grids exist within organizations, national grids have become common, usually consisting of government as well as academic facilities. Such facilities are not uncommonly lenient with blanket policies to allow inbound and outbound grid traffic. This is far from ideal, from a security perspective, but given the dynamic nature of grid use, it is impractical to keep restrictive firewalls and manually keep up with on-demand firewall reconfiguration. Other solutions are necessary, where security is not sacrificed. Apart from first generation solutions that were mostly not sufficiently generic, standardization work is now ongoing, but exclusively aimed at firewall virtualization. We argue for an architectural solution that encompasses firewall virtualization as well as other methods that can be more appropriate in many environments. This paper describes our notion of the missing layer between grid and fabric, which we refer to as the virtual connectivity layer. We have developed two implementations within this layer and discuss how they fit into a complete and well-defined architectural solution.
计算网格现在是全球电子研究的主流设施。虽然企业网格存在于组织内部,但国家网格已经变得普遍,通常由政府和学术设施组成。这类设施通常对允许入站和出站网格流量的一揽子政策很宽容。从安全的角度来看,这远非理想,但是考虑到网格使用的动态特性,保持限制性防火墙并手动跟上按需防火墙重新配置是不切实际的。在不牺牲安全性的情况下,其他解决方案是必要的。除了第一代解决方案大多不够通用之外,标准化工作正在进行中,但专门针对防火墙虚拟化。我们主张一种架构解决方案,它包含防火墙虚拟化以及其他更适合于许多环境的方法。本文描述了网格和结构之间缺失层的概念,我们称之为虚拟连接层。我们在这一层中开发了两个实现,并讨论了它们如何适合一个完整的、定义良好的体系结构解决方案。
{"title":"A Virtual Connectivity Layer for Grids","authors":"Jefferson Tan, D. Abramson, C. Enticott","doi":"10.1109/e-Science.2009.50","DOIUrl":"https://doi.org/10.1109/e-Science.2009.50","url":null,"abstract":"Computational grids are now mainstream facilities for e-research worldwide. While enterprise grids exist within organizations, national grids have become common, usually consisting of government as well as academic facilities. Such facilities are not uncommonly lenient with blanket policies to allow inbound and outbound grid traffic. This is far from ideal, from a security perspective, but given the dynamic nature of grid use, it is impractical to keep restrictive firewalls and manually keep up with on-demand firewall reconfiguration. Other solutions are necessary, where security is not sacrificed. Apart from first generation solutions that were mostly not sufficiently generic, standardization work is now ongoing, but exclusively aimed at firewall virtualization. We argue for an architectural solution that encompasses firewall virtualization as well as other methods that can be more appropriate in many environments. This paper describes our notion of the missing layer between grid and fabric, which we refer to as the virtual connectivity layer. We have developed two implementations within this layer and discuss how they fit into a complete and well-defined architectural solution.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121637294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Image Processing Portal and Web-Service for the Study of Ancient Documents 古代文献研究的图像处理门户和web服务
Pub Date : 2009-12-09 DOI: 10.1109/e-Science.2009.10
S. Tarte, D. Wallom, Pin Hu, Kang Tang, Tiejun Ma
Linking up two projects that are dedicated to facilitate the work of documentary scholars, this paper presents image processing algorithms tailored to the study of ancient documents and how they have been made available to the users through a portal that calls upon a web-service exploiting grid computational power. To that end, image processing algorithms were wrapped to fit into the National Grid Service (NGS) Uniform Execution Environment; the data model of an existing Virtual Research Environment (VRE-SDM) was extended; JSR-168 compliant portlets were developed to facilitate secure and seamless distributed image analysis; and a GridSAM interface between the portal and the NGS-installed algorithms was developed. The outcomes of the project include: a web-based application, a proof of concept for the usability of the VRE-SDM platform, an opportunity for wider dissemination for the image processing algorithms, and a proof of feasibility for the use of the NGS for Humanities applications.
这篇论文结合了两个致力于促进文献学者工作的项目,提出了为古代文献研究量身定制的图像处理算法,以及如何通过一个利用网格计算能力的网络服务的门户向用户提供这些图像处理算法。为此,图像处理算法被封装到国家网格服务(NGS)统一执行环境中;对现有虚拟研究环境(VRE-SDM)的数据模型进行了扩展;开发了符合JSR-168标准的portlet,以促进安全无缝的分布式图像分析;并在门户网站和安装了ngs的算法之间开发了GridSAM接口。该项目的成果包括:基于网络的应用程序,VRE-SDM平台可用性的概念证明,图像处理算法更广泛传播的机会,以及人文应用中使用NGS的可行性证明。
{"title":"An Image Processing Portal and Web-Service for the Study of Ancient Documents","authors":"S. Tarte, D. Wallom, Pin Hu, Kang Tang, Tiejun Ma","doi":"10.1109/e-Science.2009.10","DOIUrl":"https://doi.org/10.1109/e-Science.2009.10","url":null,"abstract":"Linking up two projects that are dedicated to facilitate the work of documentary scholars, this paper presents image processing algorithms tailored to the study of ancient documents and how they have been made available to the users through a portal that calls upon a web-service exploiting grid computational power. To that end, image processing algorithms were wrapped to fit into the National Grid Service (NGS) Uniform Execution Environment; the data model of an existing Virtual Research Environment (VRE-SDM) was extended; JSR-168 compliant portlets were developed to facilitate secure and seamless distributed image analysis; and a GridSAM interface between the portal and the NGS-installed algorithms was developed. The outcomes of the project include: a web-based application, a proof of concept for the usability of the VRE-SDM platform, an opportunity for wider dissemination for the image processing algorithms, and a proof of feasibility for the use of the NGS for Humanities applications.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131627609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Protocol for Exchanging Scientific Citations 科学引文交换议定书
Pub Date : 2009-12-09 DOI: 10.1109/E-SCIENCE.2009.32
B. Matthews, Alastair Duncan, C. Jones, C. Neylon, Mark Borkum, S. Coles, Philip Hunter
Data and publications are major outputs of science, but are typically managed in different ways in data archives and institutional repositories. In this paper we discuss a protocol for exchanging cross-citations between data and publications, so that the links between can be tracked and used. We describe a simple instance of such a protocol based on the well-known Trackback protocol, and give an example of how the protocol can be used to exchange citations between a data archive and a publication repository. We conclude by discussing the generalisation of the protocol and its implications for scholarly discourse.
数据和出版物是科学的主要产出,但通常在数据档案和机构存储库中以不同的方式进行管理。在本文中,我们讨论了一种在数据和出版物之间交换交叉引用的协议,以便可以跟踪和使用它们之间的链接。我们描述了一个基于众所周知的Trackback协议的简单实例,并给出了一个如何使用该协议在数据存档和发布存储库之间交换引用的示例。我们通过讨论协议的概括及其对学术话语的影响来结束。
{"title":"A Protocol for Exchanging Scientific Citations","authors":"B. Matthews, Alastair Duncan, C. Jones, C. Neylon, Mark Borkum, S. Coles, Philip Hunter","doi":"10.1109/E-SCIENCE.2009.32","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2009.32","url":null,"abstract":"Data and publications are major outputs of science, but are typically managed in different ways in data archives and institutional repositories. In this paper we discuss a protocol for exchanging cross-citations between data and publications, so that the links between can be tracked and used. We describe a simple instance of such a protocol based on the well-known Trackback protocol, and give an example of how the protocol can be used to exchange citations between a data archive and a publication repository. We conclude by discussing the generalisation of the protocol and its implications for scholarly discourse.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114398106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Enabling Computational Steering with an Asynchronous-Iterative Computation Framework 利用异步迭代计算框架实现计算转向
Pub Date : 2009-12-09 DOI: 10.1109/e-Science.2009.43
Al Costanzo, C. Jin, Carlos A. Varela, R. Buyya
In this paper, we present a framework that enables scientists to steer computations executing over large-scale grid computing environments. By using computational steering, users can dynamically control their simulations or computations to reach expected results more efficiently. The framework supports steerable applications by introducing an asynchronous iterative MapReduce programming model that is deployed using Hadoop over a set of virtual machines executing on a multi-cluster grid. To tolerate the heterogeneity between different sites, results are collected asynchronously and users can dynamically interact with their computations to adjust the area of interest. According to users' dynamic interaction, the framework can redistribute the computational overload between the heterogeneous sites and explore the user's interest area by using more powerful sites when possible. With our framework, the bottleneck induced by synchronisation between different sites is considerably avoided, and therefore the response to users' interaction is satisfied more efficiently. We illustrate and evaluate this framework with a scientific application that aims to fit models of the Milky Way galaxy structure to stars observed by the Sloan Digital Sky Survey.
在本文中,我们提出了一个框架,使科学家能够引导在大规模网格计算环境中执行的计算。通过使用计算转向,用户可以动态控制他们的模拟或计算,以更有效地达到预期的结果。该框架通过引入异步迭代MapReduce编程模型来支持可控制的应用程序,该模型使用Hadoop部署在一组在多集群网格上执行的虚拟机上。为了容忍不同站点之间的异质性,结果是异步收集的,用户可以动态地与他们的计算交互以调整感兴趣的区域。根据用户的动态交互,该框架可以重新分配异构站点之间的计算负荷,并尽可能使用更强大的站点来探索用户感兴趣的区域。在我们的框架下,由不同站点之间的同步引起的瓶颈被大大避免了,因此对用户交互的响应得到了更有效的满足。我们用一个科学应用程序来说明和评估这个框架,该应用程序旨在将银河系结构模型与斯隆数字巡天观测到的恒星相匹配。
{"title":"Enabling Computational Steering with an Asynchronous-Iterative Computation Framework","authors":"Al Costanzo, C. Jin, Carlos A. Varela, R. Buyya","doi":"10.1109/e-Science.2009.43","DOIUrl":"https://doi.org/10.1109/e-Science.2009.43","url":null,"abstract":"In this paper, we present a framework that enables scientists to steer computations executing over large-scale grid computing environments. By using computational steering, users can dynamically control their simulations or computations to reach expected results more efficiently. The framework supports steerable applications by introducing an asynchronous iterative MapReduce programming model that is deployed using Hadoop over a set of virtual machines executing on a multi-cluster grid. To tolerate the heterogeneity between different sites, results are collected asynchronously and users can dynamically interact with their computations to adjust the area of interest. According to users' dynamic interaction, the framework can redistribute the computational overload between the heterogeneous sites and explore the user's interest area by using more powerful sites when possible. With our framework, the bottleneck induced by synchronisation between different sites is considerably avoided, and therefore the response to users' interaction is satisfied more efficiently. We illustrate and evaluate this framework with a scientific application that aims to fit models of the Milky Way galaxy structure to stars observed by the Sloan Digital Sky Survey.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114617575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Cloud-Based Interactive Application Service 基于云的交互式应用程序服务
Pub Date : 2009-12-09 DOI: 10.1109/e-Science.2009.23
Nayden Markatchev, Roger Curry, C. Kiddle, Andrey Mirtchovski, R. Simmonds, Tingxi Tan
Accessing, running and sharing applications and data presents researchers with many challenges. Cloud computing and social networking technologies have the potential to simplify or eliminate many of these challenges. Cloud computing technologies can provide scientists with transparent and on-demand access to applications served over the Internet in a dynamic and scalable manner. Social networking technologies provide a means for easily sharing applications and data. In this paper we present an on-line/on-demand interactive application service. The service is built on a cloud computing infrastructure that dynamically provisions virtualized application servers based on user demand. An open source social networking platform is leveraged to establish a portal front end that enables applications and results to be easily shared between researchers. Furthermore, the service works with existing/legacy applications without requiring any modifications.
访问、运行和共享应用程序和数据给研究人员带来了许多挑战。云计算和社交网络技术有可能简化或消除其中的许多挑战。云计算技术可以以动态和可扩展的方式为科学家提供透明和按需访问通过互联网提供服务的应用程序。社交网络技术提供了一种方便地共享应用程序和数据的方法。在本文中,我们提出了一个在线/按需交互应用服务。该服务构建在云计算基础设施上,该基础设施可以根据用户需求动态地提供虚拟化的应用服务器。利用开源社交网络平台建立门户前端,使应用程序和结果能够在研究人员之间轻松共享。此外,该服务可以与现有/遗留应用程序一起工作,而无需进行任何修改。
{"title":"A Cloud-Based Interactive Application Service","authors":"Nayden Markatchev, Roger Curry, C. Kiddle, Andrey Mirtchovski, R. Simmonds, Tingxi Tan","doi":"10.1109/e-Science.2009.23","DOIUrl":"https://doi.org/10.1109/e-Science.2009.23","url":null,"abstract":"Accessing, running and sharing applications and data presents researchers with many challenges. Cloud computing and social networking technologies have the potential to simplify or eliminate many of these challenges. Cloud computing technologies can provide scientists with transparent and on-demand access to applications served over the Internet in a dynamic and scalable manner. Social networking technologies provide a means for easily sharing applications and data. In this paper we present an on-line/on-demand interactive application service. The service is built on a cloud computing infrastructure that dynamically provisions virtualized application servers based on user demand. An open source social networking platform is leveraged to establish a portal front end that enables applications and results to be easily shared between researchers. Furthermore, the service works with existing/legacy applications without requiring any modifications.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123869731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Some Challenges Facing Scientific Software Developers: The Case of Molecular Biology 科学软件开发者面临的一些挑战:以分子生物学为例
Pub Date : 2009-12-09 DOI: 10.1109/E-SCIENCE.2009.38
Chris Morris, J. Segal
It is apparent that the challenges facing scientific software developers are quite different from those facing their commercial counterparts. Among these differences are the challenges posed by the complex and uncertain nature of the science. There is also the fact that many scientists have experience of developing their own software, albeit in a very restricted setting, leading them to have unrealistic expectations about software development in a different setting. In this paper, we explore the challenges facing scientific software developers focusing especially on molecular biology. We claim that the nature and practice of molecular biology is quite different from that of the physical sciences and pose different problems to software developers. We do not claim that this paper is the last word on the topic but hope that it serves as the inspiration for further debate.
很明显,科学软件开发人员面临的挑战与商业软件开发人员面临的挑战大不相同。这些差异包括科学的复杂性和不确定性所带来的挑战。还有一个事实是,许多科学家都有开发自己软件的经验,尽管是在一个非常有限的环境中,这导致他们对不同环境下的软件开发抱有不切实际的期望。在本文中,我们探讨了科学软件开发人员面临的挑战,特别是在分子生物学方面。我们声称分子生物学的本质和实践与物理科学有很大的不同,并且给软件开发人员带来了不同的问题。我们并不认为这份文件是关于这个问题的最后定论,但希望它能对进一步的辩论起到启发作用。
{"title":"Some Challenges Facing Scientific Software Developers: The Case of Molecular Biology","authors":"Chris Morris, J. Segal","doi":"10.1109/E-SCIENCE.2009.38","DOIUrl":"https://doi.org/10.1109/E-SCIENCE.2009.38","url":null,"abstract":"It is apparent that the challenges facing scientific software developers are quite different from those facing their commercial counterparts. Among these differences are the challenges posed by the complex and uncertain nature of the science. There is also the fact that many scientists have experience of developing their own software, albeit in a very restricted setting, leading them to have unrealistic expectations about software development in a different setting. In this paper, we explore the challenges facing scientific software developers focusing especially on molecular biology. We claim that the nature and practice of molecular biology is quite different from that of the physical sciences and pose different problems to software developers. We do not claim that this paper is the last word on the topic but hope that it serves as the inspiration for further debate.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121345443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
The MOSFET Virtual Organisation: Grid Computing for Simulation in Nanoelectronics MOSFET虚拟组织:纳米电子学模拟的网格计算
Pub Date : 2009-12-09 DOI: 10.1109/e-Science.2009.45
R. Ferreiro, N. Seoane, M. Aldegunde, A. García-Loureiro
The next substitution of the XDEnabling Grids for E-sciencE project (EGEE) in 2010 by the European Grid Initiative (EGI), where grid infrastructure of each country will be run by National Grid Initiatives (NGI), is giving a boost to the NGI development. In this context, the Spanish National Grid Initiative (es-NGI) is being developed by the escience Spanish network. The es-NGI is developing virtual organizations where common area applications are associated. In this context, the MOSFET virtual organisation (VO-MOSFET) was born in 2009 to perform semiconductor device simulations using the es-NGI infrastructure. This virtual organization is supported by the es-NGI resource centres and it is developing a job submission and monitoring system independent of the grid middleware. In this paper a general description of the VO-MOSFET and some application level utilities of the job submission system are presented. Furthermore, a gridification example of a nanoelectronic simulation is presented proving the grid benefits for the field of nanoelectronic simulation.
2010年,欧洲电网计划(EGI)将取代XDEnabling Grid for E-sciencE项目(EGEE),其中每个国家的电网基础设施将由国家电网计划(NGI)运营,这将推动NGI的发展。在这种背景下,西班牙国家电网倡议(es-NGI)正在由escience西班牙网络开发。es-NGI正在开发虚拟组织,其中公共区域应用程序相关联。在这种背景下,MOSFET虚拟组织(VO-MOSFET)于2009年诞生,用于使用es-NGI基础架构执行半导体器件模拟。这个虚拟组织得到es-NGI资源中心的支持,它正在开发一个独立于网格中间件的作业提交和监控系统。本文介绍了VO-MOSFET的总体结构和作业提交系统的一些应用层实用程序。最后,给出了一个纳米电子仿真的网格化实例,证明了网格化在纳米电子仿真领域的优势。
{"title":"The MOSFET Virtual Organisation: Grid Computing for Simulation in Nanoelectronics","authors":"R. Ferreiro, N. Seoane, M. Aldegunde, A. García-Loureiro","doi":"10.1109/e-Science.2009.45","DOIUrl":"https://doi.org/10.1109/e-Science.2009.45","url":null,"abstract":"The next substitution of the XDEnabling Grids for E-sciencE project (EGEE) in 2010 by the European Grid Initiative (EGI), where grid infrastructure of each country will be run by National Grid Initiatives (NGI), is giving a boost to the NGI development. In this context, the Spanish National Grid Initiative (es-NGI) is being developed by the escience Spanish network. The es-NGI is developing virtual organizations where common area applications are associated. In this context, the MOSFET virtual organisation (VO-MOSFET) was born in 2009 to perform semiconductor device simulations using the es-NGI infrastructure. This virtual organization is supported by the es-NGI resource centres and it is developing a job submission and monitoring system independent of the grid middleware. In this paper a general description of the VO-MOSFET and some application level utilities of the job submission system are presented. Furthermore, a gridification example of a nanoelectronic simulation is presented proving the grid benefits for the field of nanoelectronic simulation.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125977261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression 利用索引压缩提高数据存储和分析效率
Pub Date : 2009-12-09 DOI: 10.1109/e-Science.2009.18
N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah
The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.
蛋白质组学中使用的高通量、多维质谱仪器产生的大量数据集给数据采集、存储和分析带来了挑战。数据压缩可以帮助缓解其中的一些问题,但代价是数据访问效率较低,这直接影响了数据分析的计算时间。我们已经开发了一种压缩方法,1)针对目标质谱蛋白质组学数据集进行了优化,2)提供了压缩的大小和速度的好处,同时通过允许从文件中提取未压缩数据的片段而无需解压整个文件来提高分析效率。本文描述了我们的压缩算法,给出了压缩大小和速度的比较指标,并探讨了将该算法应用于广义数据集的方法。
{"title":"Increasing the Efficiency of Data Storage and Analysis Using Indexed Compression","authors":"N. Beagley, Chad Scherrer, Yan Shi, B. Clowers, W. Danielson, A. Shah","doi":"10.1109/e-Science.2009.18","DOIUrl":"https://doi.org/10.1109/e-Science.2009.18","url":null,"abstract":"The massive data sets produced by the high- throughput, multidimensional mass spectrometry instruments used in proteomics create challenges in data acquisition, storage and analysis. Data compression can help mitigate some of these problems but at the cost of less efficient data access, which directly impacts the computational time of data analysis. We have developed a compression methodology that 1) is optimized for a targeted mass spectrometry proteomics data set and 2) provides the benefits of size and speed from compression while increasing analysis efficiency by allowing extraction of segments of uncompressed data from a file without having to uncompress the entire file. This paper describes our compression algorithm, presents comparative metrics of compression size and speed, and explores approaches for applying the algorithm to a generalized data set.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132938694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards a Universal, Quantifiable, and Scalable File Format Converter 迈向通用的、可量化的、可扩展的文件格式转换器
Pub Date : 2009-12-09 DOI: 10.1109/e-Science.2009.28
Kenton McHenry, R. Kooper, P. Bajcsy
This paper addresses the problem of designing a universal file format converter. File format conversion is a necessary part of data dissemination and curation. Complete and robust converters however are hard to find and build due to the abundance of file formats, the fact that many formats are closed, and the complexities within individual format specifications. On the other hand many software applications exist that are capable of performing some degree of data conversion between a subset of the available formats. To take advantage of this we introduce a data structure called an I/O-Graph to store the available input and output formats of these applications. Based on a concept of textit{textbf{imposed code reuse}} we use this to develop a service, NCSA Polyglot, which through this graph is capable of performing the larger union of conversions supported by the underlying software. The Polyglot system is designed to be easily extensible, scalable with the number of conversion requests, and inclusive of all available third party software. Given a data set of files from a particular domain, we are able to assign weights to the edges within the I/O-Graph indicating the amount of information retained during a conversion. These edge weights allow the system to then choose conversion paths with the least amount of information loss.
本文研究了通用文件格式转换器的设计问题。文件格式转换是数据传播和管理的必要组成部分。然而,由于文件格式的丰富,许多格式是封闭的,以及单个格式规范的复杂性,很难找到和构建完整而健壮的转换器。另一方面,存在的许多软件应用程序能够在可用格式的子集之间执行某种程度的数据转换。为了利用这一点,我们引入了一种称为I/O-Graph的数据结构来存储这些应用程序的可用输入和输出格式。基于textit{textbf{强制代码重用}}的概念,我们使用它来开发服务NCSA Polyglot,该服务通过此图能够执行底层软件支持的更大的转换联合。Polyglot系统被设计为易于扩展,可扩展的转换请求的数量,并包括所有可用的第三方软件。给定来自特定域的文件数据集,我们能够为I/ o图中的边分配权重,指示转换期间保留的信息量。这些边缘权重允许系统选择具有最少信息损失的转换路径。
{"title":"Towards a Universal, Quantifiable, and Scalable File Format Converter","authors":"Kenton McHenry, R. Kooper, P. Bajcsy","doi":"10.1109/e-Science.2009.28","DOIUrl":"https://doi.org/10.1109/e-Science.2009.28","url":null,"abstract":"This paper addresses the problem of designing a universal file format converter. File format conversion is a necessary part of data dissemination and curation. Complete and robust converters however are hard to find and build due to the abundance of file formats, the fact that many formats are closed, and the complexities within individual format specifications. On the other hand many software applications exist that are capable of performing some degree of data conversion between a subset of the available formats. To take advantage of this we introduce a data structure called an I/O-Graph to store the available input and output formats of these applications. Based on a concept of textit{textbf{imposed code reuse}} we use this to develop a service, NCSA Polyglot, which through this graph is capable of performing the larger union of conversions supported by the underlying software. The Polyglot system is designed to be easily extensible, scalable with the number of conversion requests, and inclusive of all available third party software. Given a data set of files from a particular domain, we are able to assign weights to the edges within the I/O-Graph indicating the amount of information retained during a conversion. These edge weights allow the system to then choose conversion paths with the least amount of information loss.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131606349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Running Parallel Applications with Topology-Aware Grid Middleware 使用拓扑感知网格中间件运行并行应用程序
Pub Date : 2009-12-09 DOI: 10.1109/e-Science.2009.48
P. Bar, Camille Coti, D. Groen, T. Hérault, Valentin Kravtsov, A. Schuster, M. Swain
The concept of topology-aware grid applications is derived from parallelized computational models of complex systems that are executed on heterogeneous resources, either because they require specialized hardware for certain calculations, or because their parallelization is flexible enough to exploit such resources. Here we describe two such applications, a multi-body simulation of stellar evolution, and an evolutionary algorithm that is used for reverse-engineering gene regulatory networks. We then describe the topology-aware middleware we have developed to facilitate the ``modeling-implementing-executing'' cycle of complex systems applications. The developed middleware allows topology-aware simulations to run on geographically distributed clusters with or without firewalls between them. Additionally, we describe advanced coallocation and scheduling techniques that take into account the applications topologies. Results are given based on running the topology-aware applications on the Grid'5000 infrastructure.
拓扑感知网格应用程序的概念源于在异构资源上执行的复杂系统的并行计算模型,这要么是因为它们需要专门的硬件来进行某些计算,要么是因为它们的并行化足够灵活,可以利用这些资源。在这里,我们描述了两个这样的应用,恒星演化的多体模拟和用于逆向工程基因调控网络的进化算法。然后我们描述了我们开发的拓扑感知中间件,以促进复杂系统应用程序的“建模-实现-执行”循环。所开发的中间件允许拓扑感知模拟在地理上分布的集群上运行,集群之间有或没有防火墙。此外,我们还描述了考虑到应用程序拓扑的高级协同分配和调度技术。结果是基于在Grid’5000基础设施上运行拓扑感知应用程序得出的。
{"title":"Running Parallel Applications with Topology-Aware Grid Middleware","authors":"P. Bar, Camille Coti, D. Groen, T. Hérault, Valentin Kravtsov, A. Schuster, M. Swain","doi":"10.1109/e-Science.2009.48","DOIUrl":"https://doi.org/10.1109/e-Science.2009.48","url":null,"abstract":"The concept of topology-aware grid applications is derived from parallelized computational models of complex systems that are executed on heterogeneous resources, either because they require specialized hardware for certain calculations, or because their parallelization is flexible enough to exploit such resources. Here we describe two such applications, a multi-body simulation of stellar evolution, and an evolutionary algorithm that is used for reverse-engineering gene regulatory networks. We then describe the topology-aware middleware we have developed to facilitate the ``modeling-implementing-executing'' cycle of complex systems applications. The developed middleware allows topology-aware simulations to run on geographically distributed clusters with or without firewalls between them. Additionally, we describe advanced coallocation and scheduling techniques that take into account the applications topologies. Results are given based on running the topology-aware applications on the Grid'5000 infrastructure.","PeriodicalId":325840,"journal":{"name":"2009 Fifth IEEE International Conference on e-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2009-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117179644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
2009 Fifth IEEE International Conference on e-Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1