2012 IEEE 8th International Conference on E-Science最新文献

英文中文

Lessons learned from Galaxy, a Web-based platform for high-throughput genomic analyses 从基于网络的高通量基因组分析平台Galaxy获得的经验教训

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404442

Jeremy Goecks, The Galaxy Team, A. Nekrutenko, James Taylor

High throughput sequencing assays have given rise to the field of genomics and transformed biomedical research into a computational science. Due to the large size of genomics datasets, high-performance computing is essential for analysis. Galaxy (http://galaxyproject.org) is a popular Web-based platform that can be used for all facets of genomic analyses, including data retrieval and integration, multi-step analysis, repeated analyses via workflows, visualization, collaboration, and publication. This paper describes Galaxy and discusses four lessons learned from the development of Galaxy. First, Galaxy uses open, extensible frameworks so that it can be adapted to new technologies as they become available. Second, by leveraging Web technologies, Galaxy makes genomics tools accessible to everyone and provides a common platform for collaboration. Third, Galaxy fosters community amongst both developers and users and encourages each community to adapt and extend Galaxy to meet their needs. Finally, Galaxy software development and genomic research are closely coupled, and challenges encountered during genomic research drive Galaxy development.

高通量测序技术使基因组学领域兴起，并将生物医学研究转变为计算科学。由于基因组数据集的庞大规模，高性能计算对于分析是必不可少的。Galaxy (http://galaxyproject.org)是一个流行的基于web的平台，可用于基因组分析的所有方面，包括数据检索和集成、多步骤分析、通过工作流重复分析、可视化、协作和发布。本文介绍了《银河》，并讨论了从《银河》的开发中得到的四点经验教训。首先，Galaxy使用开放的、可扩展的框架，这样它就可以适应新技术。其次，通过利用网络技术，银河使基因组学工具对每个人都可用，并提供了一个共同的合作平台。第三，Galaxy在开发者和用户之间建立社区，并鼓励每个社区调整和扩展Galaxy以满足他们的需求。最后，Galaxy软件开发与基因组研究紧密结合，基因组研究过程中遇到的挑战推动了Galaxy的发展。

{"title":"Lessons learned from Galaxy, a Web-based platform for high-throughput genomic analyses","authors":"Jeremy Goecks, The Galaxy Team, A. Nekrutenko, James Taylor","doi":"10.1109/ESCIENCE.2012.6404442","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404442","url":null,"abstract":"High throughput sequencing assays have given rise to the field of genomics and transformed biomedical research into a computational science. Due to the large size of genomics datasets, high-performance computing is essential for analysis. Galaxy (http://galaxyproject.org) is a popular Web-based platform that can be used for all facets of genomic analyses, including data retrieval and integration, multi-step analysis, repeated analyses via workflows, visualization, collaboration, and publication. This paper describes Galaxy and discusses four lessons learned from the development of Galaxy. First, Galaxy uses open, extensible frameworks so that it can be adapted to new technologies as they become available. Second, by leveraging Web technologies, Galaxy makes genomics tools accessible to everyone and provides a common platform for collaboration. Third, Galaxy fosters community amongst both developers and users and encourages each community to adapt and extend Galaxy to meet their needs. Finally, Galaxy software development and genomic research are closely coupled, and challenges encountered during genomic research drive Galaxy development.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"51 4 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91014513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

IPOL: Reviewed publication and public testing of research software IPOL:审查了研究软件的出版和公开测试

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404449

Nicolas Limare, L. Oudre, Pascal Getreuer

With the journal Image Processing On Line (IPOL), we propose to promote software to the status of regular research material and subject it to the same treatment as research papers: it must be reviewed, it must be reusable and verifiable by the research community, it must follow style and quality guidelines. In IPOL, algorithms are published with their implementation, codes are peer-reviewed, and a web-based test interface is attached to each of these articles. This results in more software released by the researchers, a better software quality achieved with the review process, and a large collection of test data gathered for each article. IPOL has been active since 2010, and has already published thirty articles.

通过在线图像处理(IPOL)杂志，我们建议将软件提升到常规研究材料的地位，并使其与研究论文一样受到同样的对待:它必须经过审查，它必须被研究社区重复使用和验证，它必须遵循风格和质量准则。在IPOL中，算法与它们的实现一起发布，代码经过同行评审，并且每个文章都附加了一个基于web的测试接口。这导致研究人员发布了更多的软件，通过审查过程获得了更好的软件质量，并为每篇文章收集了大量的测试数据。IPOL自2010年以来一直活跃，已经发表了30篇文章。

引用次数: 4

Prediction of protein solubility in E. coli 蛋白质在大肠杆菌中的溶解度预测

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404416

T. Samak, D. Gunter, Zhong Wang

Gene synthesis is a key step to convert digitally predicted proteins to functional proteins. However, it is a relatively expensive and labor-intensive process. About 30-50% of the synthesized proteins are not soluble, thereby further reduces the efficacy of gene synthesis as a method for protein function characterization. Solubility prediction from primary protein sequences holds the promise to dramatically reduce the cost of gene synthesis. This work presents a framework that creates models of solubility from sequence information. From the primary protein sequences of the genes to be synthesized, sequence features can be used to build computational models for solubility. This way, biologists can focus the effort on synthesizing genes that are highly likely to generate soluble proteins. We have developed a framework that employs several machine learning algorithms to model protein solubility. The framework is used to predict protein solubility in the Escherichia coli expression system. The analysis is performed on over 1,600 quantified proteins. The approach successfully predicted the solubility with more than 80% accuracy, and enabled in depth analysis of the most important features affecting solubility. The analysis pipeline is general and can be applied to any set of sequence features to predict any binary measure. The framework also provides the biologist with a comprehensive comparison between different learning algorithms, and insightful feature analysis.

基因合成是将数字预测蛋白转化为功能蛋白的关键步骤。然而，这是一个相对昂贵和劳动密集型的过程。大约30-50%的合成蛋白是不溶的，从而进一步降低了基因合成作为蛋白质功能表征方法的有效性。从初级蛋白序列进行溶解度预测有望大大降低基因合成的成本。这项工作提出了一个框架，从序列信息中创建溶解度模型。从待合成基因的初级蛋白序列中，序列特征可以用来建立溶解度的计算模型。这样，生物学家就可以集中精力合成那些极有可能产生可溶性蛋白质的基因。我们开发了一个框架，使用几种机器学习算法来模拟蛋白质的溶解度。该框架用于预测蛋白质在大肠杆菌表达系统中的溶解度。该分析在超过1600种定量蛋白质上进行。该方法成功地预测了溶解度，准确度超过80%，并能够深入分析影响溶解度的最重要特征。分析流水线是通用的，可以应用于任意序列特征集来预测任意二值测度。该框架还为生物学家提供了不同学习算法之间的全面比较，以及深刻的特征分析。

{"title":"Prediction of protein solubility in E. coli","authors":"T. Samak, D. Gunter, Zhong Wang","doi":"10.1109/eScience.2012.6404416","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404416","url":null,"abstract":"Gene synthesis is a key step to convert digitally predicted proteins to functional proteins. However, it is a relatively expensive and labor-intensive process. About 30-50% of the synthesized proteins are not soluble, thereby further reduces the efficacy of gene synthesis as a method for protein function characterization. Solubility prediction from primary protein sequences holds the promise to dramatically reduce the cost of gene synthesis. This work presents a framework that creates models of solubility from sequence information. From the primary protein sequences of the genes to be synthesized, sequence features can be used to build computational models for solubility. This way, biologists can focus the effort on synthesizing genes that are highly likely to generate soluble proteins. We have developed a framework that employs several machine learning algorithms to model protein solubility. The framework is used to predict protein solubility in the Escherichia coli expression system. The analysis is performed on over 1,600 quantified proteins. The approach successfully predicted the solubility with more than 80% accuracy, and enabled in depth analysis of the most important features affecting solubility. The analysis pipeline is general and can be applied to any set of sequence features to predict any binary measure. The framework also provides the biologist with a comprehensive comparison between different learning algorithms, and insightful feature analysis.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"10 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75740407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Dynamic network provisioning for data intensive applications in the cloud 为云中的数据密集型应用程序提供动态网络配置

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404461

P. Ruth, A. Mandal, Yufeng Xin, I. Baldine, Chris Heermann, J. Chase

Advanced networks are an essential element of data-driven science enabled by next generation cyberinfrastructure environments. Computational activities increasingly incorporate widely dispersed resources with linkages among software components spanning multiple sites and administrative domains. We have seen recent advances in enabling on-demand network circuits in the national and international backbones coupled with Software Defined Networking (SDN) advances like OpenFlow and programmable edge technologies like OpenStack. These advances have created an unprecedented opportunity to enable complex scientific applications to run on specially tailored, dynamic infrastructure that include compute, storage and network resources, combining the performance advantages of purpose-built infrastructures, but without the costs of a permanent infrastructure. This work presents an experience deploying scientific workflows on the ExoGENI national test bed that dynamically allocates computational resources with high-speed circuits from backbone providers. Dynamically allocated bandwidth-provisioned high-speed circuits increase the ability of scientific applications to access and stage large data sets from remote data repositories or to move computation to remote sites and access data stored locally. The remainder of this extended abstract is a brief description of the test bed and several scientific workflow applications that were deployed using bandwidth-provisioned high-speed circuits.

先进的网络是下一代网络基础设施环境下数据驱动科学的重要组成部分。计算活动越来越多地将广泛分散的资源与跨越多个站点和管理域的软件组件之间的联系结合起来。我们已经看到了在国内和国际主干网中实现按需网络电路的最新进展，以及OpenFlow等软件定义网络(SDN)的进步和OpenStack等可编程边缘技术。这些进步创造了一个前所未有的机会，使复杂的科学应用程序能够在专门定制的动态基础设施上运行，包括计算、存储和网络资源，结合了专用基础设施的性能优势，但没有永久性基础设施的成本。这项工作展示了在ExoGENI国家测试平台上部署科学工作流程的经验，该测试平台可以通过骨干提供商的高速电路动态分配计算资源。动态分配带宽的高速电路增加了科学应用程序从远程数据存储库访问和处理大型数据集的能力，或者将计算移动到远程站点并访问本地存储的数据的能力。这个扩展摘要的其余部分是对测试平台和几个使用带宽预置高速电路部署的科学工作流应用程序的简要描述。

{"title":"Dynamic network provisioning for data intensive applications in the cloud","authors":"P. Ruth, A. Mandal, Yufeng Xin, I. Baldine, Chris Heermann, J. Chase","doi":"10.1109/eScience.2012.6404461","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404461","url":null,"abstract":"Advanced networks are an essential element of data-driven science enabled by next generation cyberinfrastructure environments. Computational activities increasingly incorporate widely dispersed resources with linkages among software components spanning multiple sites and administrative domains. We have seen recent advances in enabling on-demand network circuits in the national and international backbones coupled with Software Defined Networking (SDN) advances like OpenFlow and programmable edge technologies like OpenStack. These advances have created an unprecedented opportunity to enable complex scientific applications to run on specially tailored, dynamic infrastructure that include compute, storage and network resources, combining the performance advantages of purpose-built infrastructures, but without the costs of a permanent infrastructure. This work presents an experience deploying scientific workflows on the ExoGENI national test bed that dynamically allocates computational resources with high-speed circuits from backbone providers. Dynamically allocated bandwidth-provisioned high-speed circuits increase the ability of scientific applications to access and stage large data sets from remote data repositories or to move computation to remote sites and access data stored locally. The remainder of this extended abstract is a brief description of the test bed and several scientific workflow applications that were deployed using bandwidth-provisioned high-speed circuits.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"28 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74739470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Enabling scientific data sharing and re-use 促进科学数据共享和再利用

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404475

B. Minsker, T. Wietsma

Research data sharing is one of the key challenges in the e-science era. IT technologies facilitate an enhanced management and sharing of research data. It is crucial to understand the current status of research data sharing in order to facilitate enhanced data sharing in the future. In this study, a conceptual model has been developed to characterize the process of data sharing and the factors which give rise to variations in data re-use. The study goes beyond a solely technical analysis and includes also psychological, social, organizational, legal and political components. The model was developed based on the literature and 21 face to face interviews with research, funding, data centre and publishing experts. It was validated by both a vigorous workshop and a further 55 structured telephone interviews. The overall model identifies sub-models of process, of context, and of drivers, barriers and enablers. These provide a comprehensive description of the factors that enable or inhibit the sharing of research data. They affect whether data are shared, how they are shared, and how successfully they are shared. Implementing the enablers will help the research community overcome the barriers to data re-use to facilitate future e-science endeavors.

科研数据共享是电子科学时代面临的主要挑战之一。资讯科技有助加强研究资料的管理和分享。了解研究数据共享的现状对于促进未来的数据共享至关重要。在本研究中，我们建立了一个概念模型来描述数据共享的过程和引起数据重用变化的因素。这项研究超越了单纯的技术分析，还包括心理、社会、组织、法律和政治方面的内容。该模型是基于文献和与研究、资助、数据中心和出版专家的21次面对面访谈而开发的。这一结论得到了一次有力的研讨会和另外55次有组织的电话采访的证实。整个模型确定了过程、上下文、驱动因素、障碍和促成因素的子模型。这些提供了一个全面的描述，使或抑制研究数据共享的因素。它们影响数据是否共享、如何共享以及共享的成功程度。实现这些使能器将帮助研究团体克服数据重用的障碍，从而促进未来的电子科学努力。

{"title":"Enabling scientific data sharing and re-use","authors":"B. Minsker, T. Wietsma","doi":"10.1109/eScience.2012.6404475","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404475","url":null,"abstract":"Research data sharing is one of the key challenges in the e-science era. IT technologies facilitate an enhanced management and sharing of research data. It is crucial to understand the current status of research data sharing in order to facilitate enhanced data sharing in the future. In this study, a conceptual model has been developed to characterize the process of data sharing and the factors which give rise to variations in data re-use. The study goes beyond a solely technical analysis and includes also psychological, social, organizational, legal and political components. The model was developed based on the literature and 21 face to face interviews with research, funding, data centre and publishing experts. It was validated by both a vigorous workshop and a further 55 structured telephone interviews. The overall model identifies sub-models of process, of context, and of drivers, barriers and enablers. These provide a comprehensive description of the factors that enable or inhibit the sharing of research data. They affect whether data are shared, how they are shared, and how successfully they are shared. Implementing the enablers will help the research community overcome the barriers to data re-use to facilitate future e-science endeavors.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"14 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83723999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Collaborative information management in scientific research processes 科研过程中的协同信息管理

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404478

S. Crompton, B. Matthews, Erica Y. Yang, C. Neylon, S. Coles

Research is an incremental process that both generates and consumes diverse artifacts over its lifetime. A typical research lifecycle may involve creating experimental or observational data using multiple facilities or instruments; refining raw data into derived data to test hypotheses; publishing and presenting the findings in various formats. Each stage of this process commonly involves support systems with independent management; this however hinders e-scholarship as human mediation is required to track and access related research outputs. In this paper, we describe a collaborative research information management infrastructure based on STFC facilities. The pilot system uses the InteRCom peer-to-peer protocol to propagate typed links between digital contents spread across repositories. The resultant linked web of data offers a simple but versatile solution to the tracking of research outputs in context, as these semantically annotated links form a graph of citation and provenance which can be analyzed, traversed or aggregated according to the link resource or property of interest.

研究是一个增量过程，在其生命周期中产生和消耗各种工件。典型的研究生命周期可能涉及使用多种设施或仪器创建实验或观测数据;将原始数据提炼为衍生数据以检验假设;以各种形式发布和展示调查结果。这一过程的每个阶段通常涉及具有独立管理的支持系统;然而，这阻碍了电子奖学金，因为需要人工调解来跟踪和访问相关的研究成果。在本文中，我们描述了一个基于STFC设施的协同研究信息管理基础设施。试点系统使用InteRCom点对点协议在跨存储库的数字内容之间传播类型链接。由此产生的数据链接网络为在上下文中跟踪研究成果提供了一个简单但通用的解决方案，因为这些语义注释的链接形成了一个引用和来源的图表，可以根据链接资源或感兴趣的属性来分析、遍历或汇总。

引用次数: 3

Open Social based group access control framework for e-Science data infrastructure 基于开放社会的e-Science数据基础设施组访问控制框架

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404488

Hui Zhang, Wenjun Wu, ZhenAn Li

In an e-Science data infrastructure, access control is a vital component to facilitate the management of the collective data and computing resources shared by researchers from geographically distributed locations. But conventional virtual organization based access control frameworks are not suitable for self-organizing, ad-hoc and opportunistic scientific collaborations, in which scientists can easily set up group-oriented authorization rules across the administrative domains. Using the emerging OAuth2.0 protocol, this paper introduces a novel Open Social based access control framework to support ad-hoc team formation and user-controlled resource sharing. Our experiences with development of the framework in e-Science data infrastructure projects demonstrate that the proposed framework is a very promising approach to resource sharing in cross-domain e-science environments.

在电子科学数据基础设施中，访问控制是促进来自地理分布位置的研究人员共享的集体数据和计算资源管理的重要组成部分。但是传统的基于虚拟组织的访问控制框架不适合自组织的、临时的和机会主义的科学合作，在这种合作中科学家可以很容易地跨管理域建立面向组的授权规则。利用新兴的OAuth2.0协议，本文引入了一种新的基于开放社会的访问控制框架，以支持特设团队的组建和用户控制的资源共享。我们在电子科学数据基础设施项目中开发框架的经验表明，所提出的框架是跨领域电子科学环境中资源共享的一种非常有前途的方法。

引用次数: 7

Enabling large genomic data transfers using nation-wide and international dynamic lightpaths 使用全国和国际动态光路实现大型基因组数据传输

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404458

J. Bot, M. D. Vos, S. Boele, M. Reinders, J. Kok

The recent advances made in high throughput genomic sequencing allow researchers to accurately determine the genetic make-up of an individual. Sharing this data across research institutes has proven to be challenging as the amount of data and available bandwidth cause large delays. Here, we present a network of dynamic lightpaths dedicated to the life sciences which connects research groups within the Netherlands to each other, to compute and storage providers and to commercial partners.

最近在高通量基因组测序方面取得的进展使研究人员能够准确地确定个体的基因组成。事实证明，在研究机构之间共享这些数据具有挑战性，因为数据量和可用带宽会导致大量延迟。在这里，我们展示了一个致力于生命科学的动态光路网络，它将荷兰的研究小组、计算和存储提供商以及商业合作伙伴联系在一起。

引用次数: 6

Calibration of watershed models using cloud computing 基于云计算的流域模型校正

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404420

M. Humphrey, N. Beekwilder, J. Goodall, M. Ercan

Understanding hydrologic systems at the scale of large watersheds and river basins is critically important to society when faced with extreme events, such as floods and droughts, or with concerns about water quality. A critical requirement of watershed modeling is model calibration, in which the computational model's parameters are varied during a search algorithm in order to find the best match against physically-observed phenomena such as streamflow. Because it is generally performed on a laptop computer, this calibration phase can be very time-consuming, significantly limiting the ability of a hydrologist to experiment with different models. In this paper, we describe our system for watershed model calibration using cloud computing, specifically Microsoft Windows Azure. With a representative watershed model whose calibration takes 11.4 hours on a commodity laptop, our cloud-based system calibrates the watershed model in 43.32 minutes using 16 cloud cores (15.78x speedup), 11.76 minutes using 64 cloud cores (58.13x speedup), and 5.03 minutes using 256 cloud cores (135.89x speedup). We believe that such speed-ups offer the potential toward real-time interactive model creation with continuous calibration, ushering in a new paradigm for watershed modeling.

在面对洪水和干旱等极端事件或对水质的担忧时，了解大型流域和河流流域尺度上的水文系统对社会至关重要。流域建模的一个关键要求是模型校准，即在搜索算法中改变计算模型的参数，以便找到与物理观测现象(如溪流)的最佳匹配。由于通常是在笔记本电脑上进行的，这个校准阶段可能非常耗时，极大地限制了水文学家使用不同模型进行实验的能力。在本文中，我们描述了我们使用云计算，特别是微软Windows Azure的分水岭模型校准系统。以一个典型的分水岭模型为例，在一台商用笔记本电脑上校准需要11.4小时，我们基于云的系统使用16个云核(15.78倍加速)在43.32分钟内校准分水岭模型，使用64个云核(58.13倍加速)在11.76分钟内校准分水岭模型，使用256个云核(135.89倍加速)在5.03分钟内校准分水岭模型。我们相信，这种加速为持续校准的实时交互式模型创建提供了潜力，为流域建模带来了新的范例。

{"title":"Calibration of watershed models using cloud computing","authors":"M. Humphrey, N. Beekwilder, J. Goodall, M. Ercan","doi":"10.1109/ESCIENCE.2012.6404420","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404420","url":null,"abstract":"Understanding hydrologic systems at the scale of large watersheds and river basins is critically important to society when faced with extreme events, such as floods and droughts, or with concerns about water quality. A critical requirement of watershed modeling is model calibration, in which the computational model's parameters are varied during a search algorithm in order to find the best match against physically-observed phenomena such as streamflow. Because it is generally performed on a laptop computer, this calibration phase can be very time-consuming, significantly limiting the ability of a hydrologist to experiment with different models. In this paper, we describe our system for watershed model calibration using cloud computing, specifically Microsoft Windows Azure. With a representative watershed model whose calibration takes 11.4 hours on a commodity laptop, our cloud-based system calibrates the watershed model in 43.32 minutes using 16 cloud cores (15.78x speedup), 11.76 minutes using 64 cloud cores (58.13x speedup), and 5.03 minutes using 256 cloud cores (135.89x speedup). We believe that such speed-ups offer the potential toward real-time interactive model creation with continuous calibration, ushering in a new paradigm for watershed modeling.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"3 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73145695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

An integrated science portal for collaborative compute and data intensive protein structure studies 一个集成的科学门户网站，用于协同计算和数据密集型蛋白质结构研究

2012 IEEE 8th International Conference on E-Science

Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404425

I. Stokes-Rees, D. O'Donovan, Peter Doherty, Meghan Porter-Mahoney, P. Śliż

The SBGrid Science Portal provides multi-modal access to computational infrastructure, data storage, and data analysis tools for the structural biology community. It incorporates features not previously seen in cyberinfrastructure science gateways. It enables researchers to securely share a computational study area, including large volumes of data and active computational workflows. A rich identity management system has been developed that simplifies federated access to US national cyberinfrastructure, distributed data storage, and high performance file transfer tools. It integrates components from the Virtual Data Toolkit, Condor, glideinWMS, the Globus Toolkit and Globus Online, the FreeIPA identity management system, Apache web server, and the Django web framework.

SBGrid科学门户为结构生物学社区提供了对计算基础设施、数据存储和数据分析工具的多模式访问。它包含了以前在网络基础设施科学网关中没有看到的功能。它使研究人员能够安全地共享计算研究区域，包括大量数据和活跃的计算工作流程。已经开发了一个丰富的身份管理系统，简化了对美国国家网络基础设施、分布式数据存储和高性能文件传输工具的联合访问。它集成了来自Virtual Data Toolkit、Condor、glideinWMS、Globus Toolkit和Globus Online、FreeIPA身份管理系统、Apache web服务器和Django web框架的组件。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2012 IEEE 8th International Conference on E-Science

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀