首页 > 最新文献

2012 IEEE 8th International Conference on E-Science最新文献

英文 中文
ExSciTecH: Expanding volunteer computing to Explore Science, Technology, and Health ExSciTecH:扩大志愿者计算探索科学、技术和健康
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404451
Michael Matheny, S. Schlachter, L. M. Crouse, E. T. Kimmel, Trilce Estrada, Marcel Schumann, R. Armen, G. Zoppetti, M. Taufer
This paper presents ExSciTecH, an NSF-funded project deploying volunteer computing (VC) systems to Explore Science, Tecenology, and Health. ExSciTecH aims at radically transforming VC systems and the volunteer's experience. To pursue this goal, ExSciTecH integrates and uses gameplay environments into BOINC, a well-known VC middleware, to involve the volunteers not only for simply donating idle cycles but also for actively participating in scientific discovery, i.e., generating new simulations side by side with the scientists. More specifically, ExSciTecH plugs into the BOINC framework extending it with two main gaming components, i.e., a learning component that includes a suite of games for training users on relevant biochemical concepts, and an engaging component that includes a suite of games to engage volunteers in drug design and scientific discovery. We assessed the impact of a first implementation of the learning game on a group of students at the University of Delaware. Our tests clearly show how ExSciTecH can generate higher levels of enthusiasm than more traditional learning tools in our students.
本文介绍了ExSciTecH,一个由美国国家科学基金会资助的项目,部署志愿计算(VC)系统来探索科学、技术和健康。ExSciTecH旨在从根本上改变VC系统和志愿者的体验。为了实现这一目标,ExSciTecH将游戏环境整合并使用到著名的VC中间件BOINC中,让志愿者不仅可以简单地捐赠空闲周期,还可以积极参与科学发现,即与科学家一起生成新的模拟。更具体地说,ExSciTecH插入BOINC框架,通过两个主要的游戏组件扩展它,即,一个学习组件,包括一套游戏,用于培训用户相关的生化概念,以及一个吸引组件,包括一套游戏,参与药物设计和科学发现的志愿者。我们在特拉华大学的一组学生中评估了第一次实施学习游戏的影响。我们的测试清楚地表明,与传统的学习工具相比,ExSciTecH可以在学生中产生更高水平的热情。
{"title":"ExSciTecH: Expanding volunteer computing to Explore Science, Technology, and Health","authors":"Michael Matheny, S. Schlachter, L. M. Crouse, E. T. Kimmel, Trilce Estrada, Marcel Schumann, R. Armen, G. Zoppetti, M. Taufer","doi":"10.1109/eScience.2012.6404451","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404451","url":null,"abstract":"This paper presents ExSciTecH, an NSF-funded project deploying volunteer computing (VC) systems to Explore Science, Tecenology, and Health. ExSciTecH aims at radically transforming VC systems and the volunteer's experience. To pursue this goal, ExSciTecH integrates and uses gameplay environments into BOINC, a well-known VC middleware, to involve the volunteers not only for simply donating idle cycles but also for actively participating in scientific discovery, i.e., generating new simulations side by side with the scientists. More specifically, ExSciTecH plugs into the BOINC framework extending it with two main gaming components, i.e., a learning component that includes a suite of games for training users on relevant biochemical concepts, and an engaging component that includes a suite of games to engage volunteers in drug design and scientific discovery. We assessed the impact of a first implementation of the learning game on a group of students at the University of Delaware. Our tests clearly show how ExSciTecH can generate higher levels of enthusiasm than more traditional learning tools in our students.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"8 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84163478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Image retrieval in the unstructured data management system AUDR 非结构化数据管理系统AUDR中的图像检索
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404474
Junwu Luo, B. Lang, Chao Tian, Danchen Zhang
The explosive growth of image data leads to severe challenges to the traditional image retrieval methods. In order to manage massive images more accurate and efficient, this paper firstly proposes a scalable architecture for image retrieval based on a uniform data model and makes this function a sub-engine of AUDR, an advanced unstructured data management system, which can simultaneously manage several kinds of unstructured data including image, video, audio and text. The paper then proposes a new image retrieval algorithm, which incorporates rich visual features and two text models for multi-modal retrieval. Experiments on both ImageNet dataset and ImageCLEF medical dataset show that our proposed architecture and the new retrieval algorithm are appropriate for efficient management of massive image.
图像数据的爆炸式增长对传统的图像检索方法提出了严峻的挑战。为了更准确、高效地管理海量图像,本文首先提出了一种基于统一数据模型的可扩展图像检索架构,并将该功能作为先进的非结构化数据管理系统AUDR的子引擎,实现对图像、视频、音频和文本等多种非结构化数据的同时管理。然后提出了一种新的图像检索算法,该算法结合了丰富的视觉特征和两种文本模型进行多模态检索。在ImageNet数据集和ImageCLEF医学数据集上的实验表明,我们提出的检索架构和新算法适用于海量图像的高效管理。
{"title":"Image retrieval in the unstructured data management system AUDR","authors":"Junwu Luo, B. Lang, Chao Tian, Danchen Zhang","doi":"10.1109/eScience.2012.6404474","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404474","url":null,"abstract":"The explosive growth of image data leads to severe challenges to the traditional image retrieval methods. In order to manage massive images more accurate and efficient, this paper firstly proposes a scalable architecture for image retrieval based on a uniform data model and makes this function a sub-engine of AUDR, an advanced unstructured data management system, which can simultaneously manage several kinds of unstructured data including image, video, audio and text. The paper then proposes a new image retrieval algorithm, which incorporates rich visual features and two text models for multi-modal retrieval. Experiments on both ImageNet dataset and ImageCLEF medical dataset show that our proposed architecture and the new retrieval algorithm are appropriate for efficient management of massive image.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"1 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81342863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Prediction of protein solubility in E. coli 蛋白质在大肠杆菌中的溶解度预测
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404416
T. Samak, D. Gunter, Zhong Wang
Gene synthesis is a key step to convert digitally predicted proteins to functional proteins. However, it is a relatively expensive and labor-intensive process. About 30-50% of the synthesized proteins are not soluble, thereby further reduces the efficacy of gene synthesis as a method for protein function characterization. Solubility prediction from primary protein sequences holds the promise to dramatically reduce the cost of gene synthesis. This work presents a framework that creates models of solubility from sequence information. From the primary protein sequences of the genes to be synthesized, sequence features can be used to build computational models for solubility. This way, biologists can focus the effort on synthesizing genes that are highly likely to generate soluble proteins. We have developed a framework that employs several machine learning algorithms to model protein solubility. The framework is used to predict protein solubility in the Escherichia coli expression system. The analysis is performed on over 1,600 quantified proteins. The approach successfully predicted the solubility with more than 80% accuracy, and enabled in depth analysis of the most important features affecting solubility. The analysis pipeline is general and can be applied to any set of sequence features to predict any binary measure. The framework also provides the biologist with a comprehensive comparison between different learning algorithms, and insightful feature analysis.
基因合成是将数字预测蛋白转化为功能蛋白的关键步骤。然而,这是一个相对昂贵和劳动密集型的过程。大约30-50%的合成蛋白是不溶的,从而进一步降低了基因合成作为蛋白质功能表征方法的有效性。从初级蛋白序列进行溶解度预测有望大大降低基因合成的成本。这项工作提出了一个框架,从序列信息中创建溶解度模型。从待合成基因的初级蛋白序列中,序列特征可以用来建立溶解度的计算模型。这样,生物学家就可以集中精力合成那些极有可能产生可溶性蛋白质的基因。我们开发了一个框架,使用几种机器学习算法来模拟蛋白质的溶解度。该框架用于预测蛋白质在大肠杆菌表达系统中的溶解度。该分析在超过1600种定量蛋白质上进行。该方法成功地预测了溶解度,准确度超过80%,并能够深入分析影响溶解度的最重要特征。分析流水线是通用的,可以应用于任意序列特征集来预测任意二值测度。该框架还为生物学家提供了不同学习算法之间的全面比较,以及深刻的特征分析。
{"title":"Prediction of protein solubility in E. coli","authors":"T. Samak, D. Gunter, Zhong Wang","doi":"10.1109/eScience.2012.6404416","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404416","url":null,"abstract":"Gene synthesis is a key step to convert digitally predicted proteins to functional proteins. However, it is a relatively expensive and labor-intensive process. About 30-50% of the synthesized proteins are not soluble, thereby further reduces the efficacy of gene synthesis as a method for protein function characterization. Solubility prediction from primary protein sequences holds the promise to dramatically reduce the cost of gene synthesis. This work presents a framework that creates models of solubility from sequence information. From the primary protein sequences of the genes to be synthesized, sequence features can be used to build computational models for solubility. This way, biologists can focus the effort on synthesizing genes that are highly likely to generate soluble proteins. We have developed a framework that employs several machine learning algorithms to model protein solubility. The framework is used to predict protein solubility in the Escherichia coli expression system. The analysis is performed on over 1,600 quantified proteins. The approach successfully predicted the solubility with more than 80% accuracy, and enabled in depth analysis of the most important features affecting solubility. The analysis pipeline is general and can be applied to any set of sequence features to predict any binary measure. The framework also provides the biologist with a comprehensive comparison between different learning algorithms, and insightful feature analysis.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"10 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75740407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Lessons learned from Galaxy, a Web-based platform for high-throughput genomic analyses 从基于网络的高通量基因组分析平台Galaxy获得的经验教训
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404442
Jeremy Goecks, The Galaxy Team, A. Nekrutenko, James Taylor
High throughput sequencing assays have given rise to the field of genomics and transformed biomedical research into a computational science. Due to the large size of genomics datasets, high-performance computing is essential for analysis. Galaxy (http://galaxyproject.org) is a popular Web-based platform that can be used for all facets of genomic analyses, including data retrieval and integration, multi-step analysis, repeated analyses via workflows, visualization, collaboration, and publication. This paper describes Galaxy and discusses four lessons learned from the development of Galaxy. First, Galaxy uses open, extensible frameworks so that it can be adapted to new technologies as they become available. Second, by leveraging Web technologies, Galaxy makes genomics tools accessible to everyone and provides a common platform for collaboration. Third, Galaxy fosters community amongst both developers and users and encourages each community to adapt and extend Galaxy to meet their needs. Finally, Galaxy software development and genomic research are closely coupled, and challenges encountered during genomic research drive Galaxy development.
高通量测序技术使基因组学领域兴起,并将生物医学研究转变为计算科学。由于基因组数据集的庞大规模,高性能计算对于分析是必不可少的。Galaxy (http://galaxyproject.org)是一个流行的基于web的平台,可用于基因组分析的所有方面,包括数据检索和集成、多步骤分析、通过工作流重复分析、可视化、协作和发布。本文介绍了《银河》,并讨论了从《银河》的开发中得到的四点经验教训。首先,Galaxy使用开放的、可扩展的框架,这样它就可以适应新技术。其次,通过利用网络技术,银河使基因组学工具对每个人都可用,并提供了一个共同的合作平台。第三,Galaxy在开发者和用户之间建立社区,并鼓励每个社区调整和扩展Galaxy以满足他们的需求。最后,Galaxy软件开发与基因组研究紧密结合,基因组研究过程中遇到的挑战推动了Galaxy的发展。
{"title":"Lessons learned from Galaxy, a Web-based platform for high-throughput genomic analyses","authors":"Jeremy Goecks, The Galaxy Team, A. Nekrutenko, James Taylor","doi":"10.1109/ESCIENCE.2012.6404442","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404442","url":null,"abstract":"High throughput sequencing assays have given rise to the field of genomics and transformed biomedical research into a computational science. Due to the large size of genomics datasets, high-performance computing is essential for analysis. Galaxy (http://galaxyproject.org) is a popular Web-based platform that can be used for all facets of genomic analyses, including data retrieval and integration, multi-step analysis, repeated analyses via workflows, visualization, collaboration, and publication. This paper describes Galaxy and discusses four lessons learned from the development of Galaxy. First, Galaxy uses open, extensible frameworks so that it can be adapted to new technologies as they become available. Second, by leveraging Web technologies, Galaxy makes genomics tools accessible to everyone and provides a common platform for collaboration. Third, Galaxy fosters community amongst both developers and users and encourages each community to adapt and extend Galaxy to meet their needs. Finally, Galaxy software development and genomic research are closely coupled, and challenges encountered during genomic research drive Galaxy development.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"51 4 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91014513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Enabling scientific data sharing and re-use 促进科学数据共享和再利用
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404475
B. Minsker, T. Wietsma
Research data sharing is one of the key challenges in the e-science era. IT technologies facilitate an enhanced management and sharing of research data. It is crucial to understand the current status of research data sharing in order to facilitate enhanced data sharing in the future. In this study, a conceptual model has been developed to characterize the process of data sharing and the factors which give rise to variations in data re-use. The study goes beyond a solely technical analysis and includes also psychological, social, organizational, legal and political components. The model was developed based on the literature and 21 face to face interviews with research, funding, data centre and publishing experts. It was validated by both a vigorous workshop and a further 55 structured telephone interviews. The overall model identifies sub-models of process, of context, and of drivers, barriers and enablers. These provide a comprehensive description of the factors that enable or inhibit the sharing of research data. They affect whether data are shared, how they are shared, and how successfully they are shared. Implementing the enablers will help the research community overcome the barriers to data re-use to facilitate future e-science endeavors.
科研数据共享是电子科学时代面临的主要挑战之一。资讯科技有助加强研究资料的管理和分享。了解研究数据共享的现状对于促进未来的数据共享至关重要。在本研究中,我们建立了一个概念模型来描述数据共享的过程和引起数据重用变化的因素。这项研究超越了单纯的技术分析,还包括心理、社会、组织、法律和政治方面的内容。该模型是基于文献和与研究、资助、数据中心和出版专家的21次面对面访谈而开发的。这一结论得到了一次有力的研讨会和另外55次有组织的电话采访的证实。整个模型确定了过程、上下文、驱动因素、障碍和促成因素的子模型。这些提供了一个全面的描述,使或抑制研究数据共享的因素。它们影响数据是否共享、如何共享以及共享的成功程度。实现这些使能器将帮助研究团体克服数据重用的障碍,从而促进未来的电子科学努力。
{"title":"Enabling scientific data sharing and re-use","authors":"B. Minsker, T. Wietsma","doi":"10.1109/eScience.2012.6404475","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404475","url":null,"abstract":"Research data sharing is one of the key challenges in the e-science era. IT technologies facilitate an enhanced management and sharing of research data. It is crucial to understand the current status of research data sharing in order to facilitate enhanced data sharing in the future. In this study, a conceptual model has been developed to characterize the process of data sharing and the factors which give rise to variations in data re-use. The study goes beyond a solely technical analysis and includes also psychological, social, organizational, legal and political components. The model was developed based on the literature and 21 face to face interviews with research, funding, data centre and publishing experts. It was validated by both a vigorous workshop and a further 55 structured telephone interviews. The overall model identifies sub-models of process, of context, and of drivers, barriers and enablers. These provide a comprehensive description of the factors that enable or inhibit the sharing of research data. They affect whether data are shared, how they are shared, and how successfully they are shared. Implementing the enablers will help the research community overcome the barriers to data re-use to facilitate future e-science endeavors.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"14 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83723999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Collaborative information management in scientific research processes 科研过程中的协同信息管理
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404478
S. Crompton, B. Matthews, Erica Y. Yang, C. Neylon, S. Coles
Research is an incremental process that both generates and consumes diverse artifacts over its lifetime. A typical research lifecycle may involve creating experimental or observational data using multiple facilities or instruments; refining raw data into derived data to test hypotheses; publishing and presenting the findings in various formats. Each stage of this process commonly involves support systems with independent management; this however hinders e-scholarship as human mediation is required to track and access related research outputs. In this paper, we describe a collaborative research information management infrastructure based on STFC facilities. The pilot system uses the InteRCom peer-to-peer protocol to propagate typed links between digital contents spread across repositories. The resultant linked web of data offers a simple but versatile solution to the tracking of research outputs in context, as these semantically annotated links form a graph of citation and provenance which can be analyzed, traversed or aggregated according to the link resource or property of interest.
研究是一个增量过程,在其生命周期中产生和消耗各种工件。典型的研究生命周期可能涉及使用多种设施或仪器创建实验或观测数据;将原始数据提炼为衍生数据以检验假设;以各种形式发布和展示调查结果。这一过程的每个阶段通常涉及具有独立管理的支持系统;然而,这阻碍了电子奖学金,因为需要人工调解来跟踪和访问相关的研究成果。在本文中,我们描述了一个基于STFC设施的协同研究信息管理基础设施。试点系统使用InteRCom点对点协议在跨存储库的数字内容之间传播类型链接。由此产生的数据链接网络为在上下文中跟踪研究成果提供了一个简单但通用的解决方案,因为这些语义注释的链接形成了一个引用和来源的图表,可以根据链接资源或感兴趣的属性来分析、遍历或汇总。
{"title":"Collaborative information management in scientific research processes","authors":"S. Crompton, B. Matthews, Erica Y. Yang, C. Neylon, S. Coles","doi":"10.1109/ESCIENCE.2012.6404478","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404478","url":null,"abstract":"Research is an incremental process that both generates and consumes diverse artifacts over its lifetime. A typical research lifecycle may involve creating experimental or observational data using multiple facilities or instruments; refining raw data into derived data to test hypotheses; publishing and presenting the findings in various formats. Each stage of this process commonly involves support systems with independent management; this however hinders e-scholarship as human mediation is required to track and access related research outputs. In this paper, we describe a collaborative research information management infrastructure based on STFC facilities. The pilot system uses the InteRCom peer-to-peer protocol to propagate typed links between digital contents spread across repositories. The resultant linked web of data offers a simple but versatile solution to the tracking of research outputs in context, as these semantically annotated links form a graph of citation and provenance which can be analyzed, traversed or aggregated according to the link resource or property of interest.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"34 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88036806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Open Social based group access control framework for e-Science data infrastructure 基于开放社会的e-Science数据基础设施组访问控制框架
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404488
Hui Zhang, Wenjun Wu, ZhenAn Li
In an e-Science data infrastructure, access control is a vital component to facilitate the management of the collective data and computing resources shared by researchers from geographically distributed locations. But conventional virtual organization based access control frameworks are not suitable for self-organizing, ad-hoc and opportunistic scientific collaborations, in which scientists can easily set up group-oriented authorization rules across the administrative domains. Using the emerging OAuth2.0 protocol, this paper introduces a novel Open Social based access control framework to support ad-hoc team formation and user-controlled resource sharing. Our experiences with development of the framework in e-Science data infrastructure projects demonstrate that the proposed framework is a very promising approach to resource sharing in cross-domain e-science environments.
在电子科学数据基础设施中,访问控制是促进来自地理分布位置的研究人员共享的集体数据和计算资源管理的重要组成部分。但是传统的基于虚拟组织的访问控制框架不适合自组织的、临时的和机会主义的科学合作,在这种合作中科学家可以很容易地跨管理域建立面向组的授权规则。利用新兴的OAuth2.0协议,本文引入了一种新的基于开放社会的访问控制框架,以支持特设团队的组建和用户控制的资源共享。我们在电子科学数据基础设施项目中开发框架的经验表明,所提出的框架是跨领域电子科学环境中资源共享的一种非常有前途的方法。
{"title":"Open Social based group access control framework for e-Science data infrastructure","authors":"Hui Zhang, Wenjun Wu, ZhenAn Li","doi":"10.1109/eScience.2012.6404488","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404488","url":null,"abstract":"In an e-Science data infrastructure, access control is a vital component to facilitate the management of the collective data and computing resources shared by researchers from geographically distributed locations. But conventional virtual organization based access control frameworks are not suitable for self-organizing, ad-hoc and opportunistic scientific collaborations, in which scientists can easily set up group-oriented authorization rules across the administrative domains. Using the emerging OAuth2.0 protocol, this paper introduces a novel Open Social based access control framework to support ad-hoc team formation and user-controlled resource sharing. Our experiences with development of the framework in e-Science data infrastructure projects demonstrate that the proposed framework is a very promising approach to resource sharing in cross-domain e-science environments.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"5 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91002914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Enabling large genomic data transfers using nation-wide and international dynamic lightpaths 使用全国和国际动态光路实现大型基因组数据传输
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404458
J. Bot, M. D. Vos, S. Boele, M. Reinders, J. Kok
The recent advances made in high throughput genomic sequencing allow researchers to accurately determine the genetic make-up of an individual. Sharing this data across research institutes has proven to be challenging as the amount of data and available bandwidth cause large delays. Here, we present a network of dynamic lightpaths dedicated to the life sciences which connects research groups within the Netherlands to each other, to compute and storage providers and to commercial partners.
最近在高通量基因组测序方面取得的进展使研究人员能够准确地确定个体的基因组成。事实证明,在研究机构之间共享这些数据具有挑战性,因为数据量和可用带宽会导致大量延迟。在这里,我们展示了一个致力于生命科学的动态光路网络,它将荷兰的研究小组、计算和存储提供商以及商业合作伙伴联系在一起。
{"title":"Enabling large genomic data transfers using nation-wide and international dynamic lightpaths","authors":"J. Bot, M. D. Vos, S. Boele, M. Reinders, J. Kok","doi":"10.1109/eScience.2012.6404458","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404458","url":null,"abstract":"The recent advances made in high throughput genomic sequencing allow researchers to accurately determine the genetic make-up of an individual. Sharing this data across research institutes has proven to be challenging as the amount of data and available bandwidth cause large delays. Here, we present a network of dynamic lightpaths dedicated to the life sciences which connects research groups within the Netherlands to each other, to compute and storage providers and to commercial partners.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"53 1","pages":"1-2"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76262478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Calibration of watershed models using cloud computing 基于云计算的流域模型校正
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404420
M. Humphrey, N. Beekwilder, J. Goodall, M. Ercan
Understanding hydrologic systems at the scale of large watersheds and river basins is critically important to society when faced with extreme events, such as floods and droughts, or with concerns about water quality. A critical requirement of watershed modeling is model calibration, in which the computational model's parameters are varied during a search algorithm in order to find the best match against physically-observed phenomena such as streamflow. Because it is generally performed on a laptop computer, this calibration phase can be very time-consuming, significantly limiting the ability of a hydrologist to experiment with different models. In this paper, we describe our system for watershed model calibration using cloud computing, specifically Microsoft Windows Azure. With a representative watershed model whose calibration takes 11.4 hours on a commodity laptop, our cloud-based system calibrates the watershed model in 43.32 minutes using 16 cloud cores (15.78x speedup), 11.76 minutes using 64 cloud cores (58.13x speedup), and 5.03 minutes using 256 cloud cores (135.89x speedup). We believe that such speed-ups offer the potential toward real-time interactive model creation with continuous calibration, ushering in a new paradigm for watershed modeling.
在面对洪水和干旱等极端事件或对水质的担忧时,了解大型流域和河流流域尺度上的水文系统对社会至关重要。流域建模的一个关键要求是模型校准,即在搜索算法中改变计算模型的参数,以便找到与物理观测现象(如溪流)的最佳匹配。由于通常是在笔记本电脑上进行的,这个校准阶段可能非常耗时,极大地限制了水文学家使用不同模型进行实验的能力。在本文中,我们描述了我们使用云计算,特别是微软Windows Azure的分水岭模型校准系统。以一个典型的分水岭模型为例,在一台商用笔记本电脑上校准需要11.4小时,我们基于云的系统使用16个云核(15.78倍加速)在43.32分钟内校准分水岭模型,使用64个云核(58.13倍加速)在11.76分钟内校准分水岭模型,使用256个云核(135.89倍加速)在5.03分钟内校准分水岭模型。我们相信,这种加速为持续校准的实时交互式模型创建提供了潜力,为流域建模带来了新的范例。
{"title":"Calibration of watershed models using cloud computing","authors":"M. Humphrey, N. Beekwilder, J. Goodall, M. Ercan","doi":"10.1109/ESCIENCE.2012.6404420","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404420","url":null,"abstract":"Understanding hydrologic systems at the scale of large watersheds and river basins is critically important to society when faced with extreme events, such as floods and droughts, or with concerns about water quality. A critical requirement of watershed modeling is model calibration, in which the computational model's parameters are varied during a search algorithm in order to find the best match against physically-observed phenomena such as streamflow. Because it is generally performed on a laptop computer, this calibration phase can be very time-consuming, significantly limiting the ability of a hydrologist to experiment with different models. In this paper, we describe our system for watershed model calibration using cloud computing, specifically Microsoft Windows Azure. With a representative watershed model whose calibration takes 11.4 hours on a commodity laptop, our cloud-based system calibrates the watershed model in 43.32 minutes using 16 cloud cores (15.78x speedup), 11.76 minutes using 64 cloud cores (58.13x speedup), and 5.03 minutes using 256 cloud cores (135.89x speedup). We believe that such speed-ups offer the potential toward real-time interactive model creation with continuous calibration, ushering in a new paradigm for watershed modeling.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"3 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73145695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
An integrated science portal for collaborative compute and data intensive protein structure studies 一个集成的科学门户网站,用于协同计算和数据密集型蛋白质结构研究
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404425
I. Stokes-Rees, D. O'Donovan, Peter Doherty, Meghan Porter-Mahoney, P. Śliż
The SBGrid Science Portal provides multi-modal access to computational infrastructure, data storage, and data analysis tools for the structural biology community. It incorporates features not previously seen in cyberinfrastructure science gateways. It enables researchers to securely share a computational study area, including large volumes of data and active computational workflows. A rich identity management system has been developed that simplifies federated access to US national cyberinfrastructure, distributed data storage, and high performance file transfer tools. It integrates components from the Virtual Data Toolkit, Condor, glideinWMS, the Globus Toolkit and Globus Online, the FreeIPA identity management system, Apache web server, and the Django web framework.
SBGrid科学门户为结构生物学社区提供了对计算基础设施、数据存储和数据分析工具的多模式访问。它包含了以前在网络基础设施科学网关中没有看到的功能。它使研究人员能够安全地共享计算研究区域,包括大量数据和活跃的计算工作流程。已经开发了一个丰富的身份管理系统,简化了对美国国家网络基础设施、分布式数据存储和高性能文件传输工具的联合访问。它集成了来自Virtual Data Toolkit、Condor、glideinWMS、Globus Toolkit和Globus Online、FreeIPA身份管理系统、Apache web服务器和Django web框架的组件。
{"title":"An integrated science portal for collaborative compute and data intensive protein structure studies","authors":"I. Stokes-Rees, D. O'Donovan, Peter Doherty, Meghan Porter-Mahoney, P. Śliż","doi":"10.1109/ESCIENCE.2012.6404425","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404425","url":null,"abstract":"The SBGrid Science Portal provides multi-modal access to computational infrastructure, data storage, and data analysis tools for the structural biology community. It incorporates features not previously seen in cyberinfrastructure science gateways. It enables researchers to securely share a computational study area, including large volumes of data and active computational workflows. A rich identity management system has been developed that simplifies federated access to US national cyberinfrastructure, distributed data storage, and high performance file transfer tools. It integrates components from the Virtual Data Toolkit, Condor, glideinWMS, the Globus Toolkit and Globus Online, the FreeIPA identity management system, Apache web server, and the Django web framework.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"22 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74037443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2012 IEEE 8th International Conference on E-Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1