首页 > 最新文献

2011 IEEE Seventh International Conference on e-Science Workshops最新文献

英文 中文
Watershed Reanalysis: Towards a National Strategy for Model-Data Integration 分水岭再分析:迈向模型-数据整合的国家战略
Pub Date : 2011-12-05 DOI: 10.1109/ESCIENCEW.2011.32
C. Duffy, Lorne Leonard, G. Bhatt, Xuan Yu, C. Lee Giles
Reanalysis or retrospective analysis is the process of re-analyzing and assimilating climate and weather observations with the current modeling context. Reanalysis is an objective, quantitative method of synthesizing all sources of information (historical and real-time observations) within a unified framework. In this context, we propose a prototype for automated and virtualized web services software using national data products for climate reanalysis, soils, geology, terrain and land cover for the purpose of water resource simulation, prediction, data assimilation, calibration and archival. The prototype for model-data integration focuses on creating tools for fast data storage from selected national databases, as well as the computational resources necessary for a dynamic, distributed watershed prediction anywhere in the continental US. In the future implementation of virtualized services will benefit from the development of a cloud cyber infrastructure as the prototype evolves to data and model intensive computation for continental scale water resource predictions.
再分析或回顾性分析是将气候和天气观测资料与当前模式背景重新分析和同化的过程。再分析是一种在统一框架内综合所有信息来源(历史和实时观察)的客观定量方法。在此背景下,我们提出了一个自动化和虚拟化web服务软件的原型,该软件使用国家数据产品进行气候再分析、土壤、地质、地形和土地覆盖,用于水资源模拟、预测、数据同化、校准和存档。模型-数据集成的原型侧重于创建工具,用于从选定的国家数据库中快速存储数据,以及为美国大陆任何地方的动态分布式分水岭预测所需的计算资源。在未来,虚拟化服务的实施将受益于云网络基础设施的发展,因为原型将演变为大陆尺度水资源预测的数据和模型密集型计算。
{"title":"Watershed Reanalysis: Towards a National Strategy for Model-Data Integration","authors":"C. Duffy, Lorne Leonard, G. Bhatt, Xuan Yu, C. Lee Giles","doi":"10.1109/ESCIENCEW.2011.32","DOIUrl":"https://doi.org/10.1109/ESCIENCEW.2011.32","url":null,"abstract":"Reanalysis or retrospective analysis is the process of re-analyzing and assimilating climate and weather observations with the current modeling context. Reanalysis is an objective, quantitative method of synthesizing all sources of information (historical and real-time observations) within a unified framework. In this context, we propose a prototype for automated and virtualized web services software using national data products for climate reanalysis, soils, geology, terrain and land cover for the purpose of water resource simulation, prediction, data assimilation, calibration and archival. The prototype for model-data integration focuses on creating tools for fast data storage from selected national databases, as well as the computational resources necessary for a dynamic, distributed watershed prediction anywhere in the continental US. In the future implementation of virtualized services will benefit from the development of a cloud cyber infrastructure as the prototype evolves to data and model intensive computation for continental scale water resource predictions.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125764446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Data Decomposition in Biomedical e-Science Applications 生物医学电子科学应用中的数据分解
Pub Date : 2011-12-05 DOI: 10.1109/eScienceW.2011.7
Yassene Mohammed, Shayan Shahand, V. Korkhov, Angela C. M. Luyf, B. V. Schaik, M. Caan, A. V. Kampen, Magnus Palmblad, S. Olabarriaga
As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains dependent on the architecture of the used e-Science infrastructure. Such architecture is in general job-driven, i.e., a (grid) job is a sequence of commands that run on the same worker node. Making use of the infrastructure involves having a parallelized application. This is done foremost by data decomposition. In general practice of parallel programming, data decomposition depends on the programmer's experience and knowledge about the used data and the algorithm/application. On the other hand, data mining scientists have an established foundation for data decomposition, automatic decomposition methods are already in use, methodologies and patterns are defined. Our experience in porting biomedical applications to the Dutch e-Science infrastructure shows that the used data decomposition to gain parallelism fit to some degree a subgroup of the data mining decomposition patterns, i.e., object set decomposition. In this paper we discuss porting three biomedical packages to a grid computing environment, two for medical imaging and one for DNA sequencing. We show how the data access of the applications was reengineered around the executables to make use of the parallel capacity of e-Science infrastructure.
随着电子科学的重点向第四范式和数据密集型科学转移,数据访问仍然依赖于所使用的电子科学基础设施的体系结构。这样的体系结构通常是作业驱动的,也就是说,(网格)作业是在同一工作节点上运行的一系列命令。使用基础设施涉及到并行化应用程序。这首先是通过数据分解完成的。在并行编程的一般实践中,数据分解取决于程序员对所使用的数据和算法/应用程序的经验和知识。另一方面,数据挖掘科学家已经建立了数据分解的基础,自动分解方法已经在使用,方法和模式已经定义。我们在将生物医学应用程序移植到荷兰e-Science基础设施方面的经验表明,用于获得并行性的数据分解在某种程度上适合数据挖掘分解模式的一个子组,即对象集分解。在本文中,我们讨论将三个生物医学包移植到网格计算环境中,两个用于医学成像,一个用于DNA测序。我们将展示如何围绕可执行文件重新设计应用程序的数据访问,以利用e-Science基础设施的并行能力。
{"title":"Data Decomposition in Biomedical e-Science Applications","authors":"Yassene Mohammed, Shayan Shahand, V. Korkhov, Angela C. M. Luyf, B. V. Schaik, M. Caan, A. V. Kampen, Magnus Palmblad, S. Olabarriaga","doi":"10.1109/eScienceW.2011.7","DOIUrl":"https://doi.org/10.1109/eScienceW.2011.7","url":null,"abstract":"As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains dependent on the architecture of the used e-Science infrastructure. Such architecture is in general job-driven, i.e., a (grid) job is a sequence of commands that run on the same worker node. Making use of the infrastructure involves having a parallelized application. This is done foremost by data decomposition. In general practice of parallel programming, data decomposition depends on the programmer's experience and knowledge about the used data and the algorithm/application. On the other hand, data mining scientists have an established foundation for data decomposition, automatic decomposition methods are already in use, methodologies and patterns are defined. Our experience in porting biomedical applications to the Dutch e-Science infrastructure shows that the used data decomposition to gain parallelism fit to some degree a subgroup of the data mining decomposition patterns, i.e., object set decomposition. In this paper we discuss porting three biomedical packages to a grid computing environment, two for medical imaging and one for DNA sequencing. We show how the data access of the applications was reengineered around the executables to make use of the parallel capacity of e-Science infrastructure.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"341 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124213989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Effective Computational Method for Evaluation of Dynamic Elecrostatic Effects of Explicit Solvent and Membrane Molecules from Molecular Dynamics Simulations 从分子动力学模拟中评价溶剂和膜分子动态静电效应的有效计算方法
Pub Date : 2011-12-05 DOI: 10.1109/eScienceW.2011.18
Y. Yonezawa
Knowledge of the electronic structures of local functional sites of proteins sheds light into their fundamental mechanisms of enzymatic reaction and processes related to electronic state. Although the dynamic effects due to solvent or membrane molecules surrounding the protein are indispensable for an accurate analysis, in current methods they have been approximated by a continuum model with polarized material, where a phenomenological and unreliable parameter, the dielectric constant, is always required. We have developed a new algorithm to reproduce an average field due to the solvent and membrane molecules, which are calculated from the long trajectory of a classical molecular dynamics simulation for a membrane protein-solvent system, by several thousands of pseudo-charges and dipoles on a closed surface surrounding a target quantum mechanical (QM) region. Since the dynamic effects are represented only by "static" pseudo-charges and dipoles, the QM calculation is necessarily done only once. We applied this algorithm to the photosynthetic reaction center of Rhodobacter sphaeroides with explicit all-atomic models of the solvent and membrane molecules. It is possible that the electronic structures of its ground state and excited state can be calculated with those microscopic "reaction field" effects.
了解蛋白质局部功能位点的电子结构有助于了解酶促反应的基本机制和与电子态相关的过程。虽然蛋白质周围的溶剂或膜分子的动态效应对于精确的分析是必不可少的,但在目前的方法中,它们是由极化材料的连续统模型近似的,其中总是需要一个现象学和不可靠的参数,即介电常数。我们开发了一种新的算法,通过在目标量子力学(QM)区域周围的封闭表面上的数千个伪电荷和偶极子,从膜蛋白-溶剂系统的经典分子动力学模拟的长轨迹中计算出溶剂和膜分子的平均场。由于动态效应仅由“静态”伪电荷和偶极子表示,量子力学计算必须只进行一次。我们将该算法应用于球形红杆菌的光合反应中心,并明确了溶剂和膜分子的全原子模型。利用这些微观的“反应场”效应,可以计算出其基态和激发态的电子结构。
{"title":"Effective Computational Method for Evaluation of Dynamic Elecrostatic Effects of Explicit Solvent and Membrane Molecules from Molecular Dynamics Simulations","authors":"Y. Yonezawa","doi":"10.1109/eScienceW.2011.18","DOIUrl":"https://doi.org/10.1109/eScienceW.2011.18","url":null,"abstract":"Knowledge of the electronic structures of local functional sites of proteins sheds light into their fundamental mechanisms of enzymatic reaction and processes related to electronic state. Although the dynamic effects due to solvent or membrane molecules surrounding the protein are indispensable for an accurate analysis, in current methods they have been approximated by a continuum model with polarized material, where a phenomenological and unreliable parameter, the dielectric constant, is always required. We have developed a new algorithm to reproduce an average field due to the solvent and membrane molecules, which are calculated from the long trajectory of a classical molecular dynamics simulation for a membrane protein-solvent system, by several thousands of pseudo-charges and dipoles on a closed surface surrounding a target quantum mechanical (QM) region. Since the dynamic effects are represented only by \"static\" pseudo-charges and dipoles, the QM calculation is necessarily done only once. We applied this algorithm to the photosynthetic reaction center of Rhodobacter sphaeroides with explicit all-atomic models of the solvent and membrane molecules. It is possible that the electronic structures of its ground state and excited state can be calculated with those microscopic \"reaction field\" effects.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128159764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessment of Resource Quality for Service Level Agreements in Life Science Grids 生命科学网格中服务水平协议的资源质量评估
Pub Date : 2011-12-05 DOI: 10.1109/ESCIENCEW.2011.17
Tibor K´lm´n
This article focuses on measuring, describing, monitoring and publishing the quality and performance of grid resources. Life science communities can employ Service Level Agreements (SLAs) with their resource providers to ensure the delivery of services. For this, it is important for both the life science communities and their providers to understand and quantify the performance and service quality of different grid environments. However, measuring service quality in grid infrastructures utilizing different middle wares, as in the German Grid Initiative, is a complex problem. We describe the state of quality metrics which are currently used by the German life science communities MediGRID, Services@MediGRID and PneumoGrid. We also identify further quality metrics for defining and monitoring grid resource quality in D-Grid. It is important to publish and exchange the quality information by grid information systems, which are the entry points to grid services. Therefore, we also present how quality information can be handled by the GLUE v2.0 Schema, which is the upcoming standard data model used by grid information systems. For measuring and monitoring the quality metrics in multi-middleware environments two approaches are discussed. The first approach extracts quality information from an external benchmarking system and loads iit to the grid information systems. The second solution targets life science communities that do not utilize legacy benchmarking systems, but operate traditional monitoring systems, like Nagios.
本文的重点是测量、描述、监控和发布网格资源的质量和性能。生命科学社区可以与其资源提供者使用服务水平协议(sla)来确保服务的交付。为此,对于生命科学社区和他们的提供者来说,理解和量化不同网格环境的性能和服务质量是很重要的。然而,在使用不同中间件的网格基础设施中测量服务质量(如在德国网格计划中)是一个复杂的问题。我们描述了德国生命科学社区MediGRID, Services@MediGRID和pneumgrid目前使用的质量指标的状态。我们还确定了用于定义和监控D-Grid中网格资源质量的进一步质量指标。网格信息系统是网格服务的入口,通过网格信息系统发布和交换高质量的信息非常重要。因此,我们还介绍了GLUE v2.0 Schema如何处理高质量的信息,这是网格信息系统即将使用的标准数据模型。为了在多中间件环境中测量和监视质量度量,讨论了两种方法。第一种方法是从外部基准系统中提取质量信息,并将其加载到网格信息系统中。第二个解决方案针对的是不使用遗留基准测试系统,而是操作传统监视系统(如Nagios)的生命科学社区。
{"title":"Assessment of Resource Quality for Service Level Agreements in Life Science Grids","authors":"Tibor K´lm´n","doi":"10.1109/ESCIENCEW.2011.17","DOIUrl":"https://doi.org/10.1109/ESCIENCEW.2011.17","url":null,"abstract":"This article focuses on measuring, describing, monitoring and publishing the quality and performance of grid resources. Life science communities can employ Service Level Agreements (SLAs) with their resource providers to ensure the delivery of services. For this, it is important for both the life science communities and their providers to understand and quantify the performance and service quality of different grid environments. However, measuring service quality in grid infrastructures utilizing different middle wares, as in the German Grid Initiative, is a complex problem. We describe the state of quality metrics which are currently used by the German life science communities MediGRID, Services@MediGRID and PneumoGrid. We also identify further quality metrics for defining and monitoring grid resource quality in D-Grid. It is important to publish and exchange the quality information by grid information systems, which are the entry points to grid services. Therefore, we also present how quality information can be handled by the GLUE v2.0 Schema, which is the upcoming standard data model used by grid information systems. For measuring and monitoring the quality metrics in multi-middleware environments two approaches are discussed. The first approach extracts quality information from an external benchmarking system and loads iit to the grid information systems. The second solution targets life science communities that do not utilize legacy benchmarking systems, but operate traditional monitoring systems, like Nagios.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121875568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaming for (Citizen) Science: Exploring Motivation and Data Quality in the Context of Crowdsourced Science through the Design and Evaluation of a Social-Computational System (公民)科学的游戏:通过社会计算系统的设计和评估,探索众包科学背景下的动机和数据质量
Pub Date : 2011-12-05 DOI: 10.1109/ESCIENCEW.2011.14
Nathan R. Prestopnik, Kevin Crowston
Citizen Sort, currently under development, is a web-based social-computational system designed to support a citizen science task, the taxonomic classification of various insect, animal, and plant species. In addition to supporting this natural science objective, the Citizen Sort platform will also support information science research goals on motivation for participation in social-computation and citizen science. In particular, this research program addresses the use of games to motivate participation in social-computational citizen science, and explores the effects of system design on motivation and data quality. A design science approach, where IT artifacts are developed to solve problems and answer research questions is described. Research questions, progress on Citizen Sort planning and implementation, and key challenges are discussed.
公民分类,目前正在开发中,是一个基于网络的社会计算系统,旨在支持公民科学任务,各种昆虫,动物和植物物种的分类。除了支持这一自然科学目标外,公民排序平台还将支持信息科学研究目标,即参与社会计算和公民科学的动机。特别地,本研究计划探讨了使用游戏来激励参与社会计算公民科学,并探讨了系统设计对动机和数据质量的影响。描述了一种设计科学方法,其中开发IT工件来解决问题并回答研究问题。讨论了研究问题、公民分类计划和实施的进展以及主要挑战。
{"title":"Gaming for (Citizen) Science: Exploring Motivation and Data Quality in the Context of Crowdsourced Science through the Design and Evaluation of a Social-Computational System","authors":"Nathan R. Prestopnik, Kevin Crowston","doi":"10.1109/ESCIENCEW.2011.14","DOIUrl":"https://doi.org/10.1109/ESCIENCEW.2011.14","url":null,"abstract":"Citizen Sort, currently under development, is a web-based social-computational system designed to support a citizen science task, the taxonomic classification of various insect, animal, and plant species. In addition to supporting this natural science objective, the Citizen Sort platform will also support information science research goals on motivation for participation in social-computation and citizen science. In particular, this research program addresses the use of games to motivate participation in social-computational citizen science, and explores the effects of system design on motivation and data quality. A design science approach, where IT artifacts are developed to solve problems and answer research questions is described. Research questions, progress on Citizen Sort planning and implementation, and key challenges are discussed.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132584671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 64
A Flexible Database-Centric Platform for Citizen Science Data Capture 一个灵活的以数据库为中心的公民科学数据捕获平台
Pub Date : 2011-12-05 DOI: 10.1109/ESCIENCEW.2011.15
C. Ellul, L. Francis, M. Haklay
The paper describes a platform developed by the Extreme Citizen Science (ExCiteS) group at University College London over the past five years to facilitate online data capture by Citizen Scientists in the context of community science, where local environmental problems are monitored. Responding to user needs, the platform has been developed to be as flexible as possible in terms of the types of data that can be captured -- these currently include numbers, text, video, photography, pull-down lists, multiple selection lists and so forth. Live data feeds and links to social networking such as twitter have also been incorporated. This platform is database-centric, and thus allows capture and storage of data from multiple devices (currently Web and mobile) in one central location. All map-based data is captured and held in native spatial data format inside the database. To support Citizen Science activity, the system has been designed to allow new projects to be added without the requirement for additional development (programming), and an administration tool developed to support this task. Each project is allocated custom themes depending on the project requirements and a variety of 'skins' can be configured to give the website a different appearance in each case. The platform is currently used by over 20 different groups within the United Kingdom -- though mostly for more social and perceptual data collection, rather than scientific. After demonstrating its use in an urban noise study, it is now adapted to use in air pollution studies. An extension to mobile devices (Android) is also underdevelopment.
这篇论文描述了一个由伦敦大学学院的极端公民科学(ExCiteS)小组在过去五年中开发的平台,它促进了公民科学家在社区科学的背景下获取在线数据,在社区科学中监测当地的环境问题。为了响应用户的需求,该平台在可捕获的数据类型方面尽可能地灵活——目前包括数字、文本、视频、照片、下拉列表、多个选择列表等等。实时数据馈送和与twitter等社交网络的链接也被纳入其中。该平台以数据库为中心,因此允许在一个中心位置捕获和存储来自多个设备(目前是Web和移动设备)的数据。所有基于地图的数据都被捕获并以本地空间数据格式保存在数据库中。为了支持公民科学活动,该系统被设计成允许在不需要额外开发(编程)的情况下添加新项目,并开发了一个管理工具来支持这一任务。每个项目都根据项目要求分配自定义主题,并且可以配置各种“皮肤”,以便在每种情况下为网站提供不同的外观。该平台目前被英国20多个不同的团体使用,尽管主要是为了收集更多的社会和感性数据,而不是科学数据。在城市噪音研究中展示了它的用途后,现在它被用于空气污染研究。移动设备(Android)的扩展也在开发中。
{"title":"A Flexible Database-Centric Platform for Citizen Science Data Capture","authors":"C. Ellul, L. Francis, M. Haklay","doi":"10.1109/ESCIENCEW.2011.15","DOIUrl":"https://doi.org/10.1109/ESCIENCEW.2011.15","url":null,"abstract":"The paper describes a platform developed by the Extreme Citizen Science (ExCiteS) group at University College London over the past five years to facilitate online data capture by Citizen Scientists in the context of community science, where local environmental problems are monitored. Responding to user needs, the platform has been developed to be as flexible as possible in terms of the types of data that can be captured -- these currently include numbers, text, video, photography, pull-down lists, multiple selection lists and so forth. Live data feeds and links to social networking such as twitter have also been incorporated. This platform is database-centric, and thus allows capture and storage of data from multiple devices (currently Web and mobile) in one central location. All map-based data is captured and held in native spatial data format inside the database. To support Citizen Science activity, the system has been designed to allow new projects to be added without the requirement for additional development (programming), and an administration tool developed to support this task. Each project is allocated custom themes depending on the project requirements and a variety of 'skins' can be configured to give the website a different appearance in each case. The platform is currently used by over 20 different groups within the United Kingdom -- though mostly for more social and perceptual data collection, rather than scientific. After demonstrating its use in an urban noise study, it is now adapted to use in air pollution studies. An extension to mobile devices (Android) is also underdevelopment.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114225097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Mechanisms for Data Quality and Validation in Citizen Science 公民科学中的数据质量和验证机制
Pub Date : 2011-12-05 DOI: 10.1109/ESCIENCEW.2011.27
A. Wiggins, Greg Newman, R. Stevenson, Kevin Crowston
Data quality is a primary concern for researchers employing a public participation in scientific research (PPSR) or ``citizen science'' approach. This mode of scientific collaboration relies on contributions from a large, often unknown population of volunteers with variable expertise. In a survey of PPSR projects, we found that most projects employ multiple mechanisms to ensure data quality and appropriate levels of validation. We created a framework of 18 mechanisms commonly employed by PPSR projects for ensuring data quality, based on direct experience of the authors and a review of the survey data, noting two categories of sources of error (protocols, participants) and three potential intervention points (before, during and after participation), which can be used to guide project design.
数据质量是采用公众参与科学研究(PPSR)或“公民科学”方法的研究人员最关心的问题。这种科学合作模式依赖于大量志愿者的贡献,这些志愿者往往是未知的,他们的专业知识各不相同。在对PPSR项目的调查中,我们发现大多数项目采用多种机制来确保数据质量和适当的验证级别。基于作者的直接经验和对调查数据的回顾,我们创建了一个由18种机制组成的框架,这些机制通常被PPSR项目用于确保数据质量,并注意到两类错误来源(协议、参与者)和三个潜在的干预点(参与之前、期间和之后),它们可用于指导项目设计。
{"title":"Mechanisms for Data Quality and Validation in Citizen Science","authors":"A. Wiggins, Greg Newman, R. Stevenson, Kevin Crowston","doi":"10.1109/ESCIENCEW.2011.27","DOIUrl":"https://doi.org/10.1109/ESCIENCEW.2011.27","url":null,"abstract":"Data quality is a primary concern for researchers employing a public participation in scientific research (PPSR) or ``citizen science'' approach. This mode of scientific collaboration relies on contributions from a large, often unknown population of volunteers with variable expertise. In a survey of PPSR projects, we found that most projects employ multiple mechanisms to ensure data quality and appropriate levels of validation. We created a framework of 18 mechanisms commonly employed by PPSR projects for ensuring data quality, based on direct experience of the authors and a review of the survey data, noting two categories of sources of error (protocols, participants) and three potential intervention points (before, during and after participation), which can be used to guide project design.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128757287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 188
Accelerating 3D Protein Modeling Using Cloud Computing: Using Rosetta as a Service on the IBM SmartCloud 使用云计算加速3D蛋白质建模:在IBM SmartCloud上使用Rosetta即服务
Pub Date : 2011-12-05 DOI: 10.1109/eScienceW.2011.12
P. Kunszt, L. Malmström, Nicola Fantini, Wibke Sudholt, M. Lautenschlager, Roland Reifler, Stefan Ruckstuhl
Biology as a scientific domain needs a growing amount of computational power. However, not every researcher has access to high performance computing resources locally. Today, it is easy to buy computing resources on demand from public cloud providers like Amazon and IBM, paying only for the amount of computing that is really being used. However, the difficulty of setting up the simulation and operating the virtual infrastructure is also often a showstopper for scientists to use cloud resources. This gap is filled by innovative software as a service providers like the ETH Spin-off company Cloud Broker GmbH, enabling a more direct access to commercial clouds for researchers in life science. Here we report on a joint project between the ETH Zurich, IBM and Cloud Broker to perform a large-scale 3D protein model simulation using the application Rosetta on the new IBM Smart Cloud Enterprise.
生物学作为一个科学领域需要越来越多的计算能力。然而,并不是每个研究人员都可以访问本地的高性能计算资源。如今,从亚马逊(Amazon)和IBM等公共云提供商那里按需购买计算资源很容易,只需按实际使用的计算量付费。然而,设置模拟和操作虚拟基础设施的难度也经常是科学家使用云资源的一个障碍。这一差距由创新软件服务提供商填补,如ETH衍生公司Cloud Broker GmbH,使生命科学研究人员能够更直接地访问商业云。在这里,我们报告了苏黎世联邦理工学院、IBM和Cloud Broker之间的一个联合项目,该项目使用新的IBM智能云企业上的Rosetta应用程序执行大规模3D蛋白质模型模拟。
{"title":"Accelerating 3D Protein Modeling Using Cloud Computing: Using Rosetta as a Service on the IBM SmartCloud","authors":"P. Kunszt, L. Malmström, Nicola Fantini, Wibke Sudholt, M. Lautenschlager, Roland Reifler, Stefan Ruckstuhl","doi":"10.1109/eScienceW.2011.12","DOIUrl":"https://doi.org/10.1109/eScienceW.2011.12","url":null,"abstract":"Biology as a scientific domain needs a growing amount of computational power. However, not every researcher has access to high performance computing resources locally. Today, it is easy to buy computing resources on demand from public cloud providers like Amazon and IBM, paying only for the amount of computing that is really being used. However, the difficulty of setting up the simulation and operating the virtual infrastructure is also often a showstopper for scientists to use cloud resources. This gap is filled by innovative software as a service providers like the ETH Spin-off company Cloud Broker GmbH, enabling a more direct access to commercial clouds for researchers in life science. Here we report on a joint project between the ETH Zurich, IBM and Cloud Broker to perform a large-scale 3D protein model simulation using the application Rosetta on the new IBM Smart Cloud Enterprise.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117324780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Type System for High Performance Communication and Computation 一种高性能通信与计算型系统
Pub Date : 2011-12-05 DOI: 10.1109/eScienceW.2011.16
G. Eisenhauer, M. Wolf, H. Abbasi, S. Klasky, K. Schwan
The manner in which data is represented, accessed and transmitted has an affect upon the efficiency of any computing system. In the domain of high performance computing, traditional frameworks like MPI have relied upon a relatively static type system with a high degree of a priori knowledge shared among the participants. However, modern scientific computing is increasingly distributed and dynamic, requiring the ability to dynamically create multi-platform workflows, to move processing to data, and to perform both in situ and streaming data analysis. Traditional approaches to data type description and communication in middleware, which typically either require a priori agreement on data types, or resort to highly inefficient representations like XML, are insufficient for the new domain of dynamic science. This paper describes a different approach, using FFS, a middleware library that implements efficient manipulation of application-level data. FFS provides for highly efficient binary data communication, XML-like examination of unknown data, and both third-party and in situ data processing via dynamic code generation. All of these capabilities are fully dynamic at run-time, without requiring a priori agreements or knowledge of the exact form of the data being communicated or analyzed.
数据的表示、访问和传输方式对任何计算系统的效率都有影响。在高性能计算领域,像MPI这样的传统框架依赖于相对静态的类型系统,参与者之间共享高度的先验知识。然而,现代科学计算越来越分布式和动态,需要能够动态创建多平台工作流,将处理移动到数据,并执行原位和流数据分析。中间件中数据类型描述和通信的传统方法通常要么需要对数据类型的先验协议,要么求助于像XML这样效率极低的表示,这对于动态科学的新领域来说是不够的。本文描述了一种不同的方法,使用FFS,一种中间件库来实现对应用程序级数据的有效操作。FFS提供了高效的二进制数据通信、类似xml的未知数据检查,以及通过动态代码生成的第三方和现场数据处理。所有这些功能在运行时都是完全动态的,不需要事先达成协议,也不需要了解正在通信或分析的数据的确切形式。
{"title":"A Type System for High Performance Communication and Computation","authors":"G. Eisenhauer, M. Wolf, H. Abbasi, S. Klasky, K. Schwan","doi":"10.1109/eScienceW.2011.16","DOIUrl":"https://doi.org/10.1109/eScienceW.2011.16","url":null,"abstract":"The manner in which data is represented, accessed and transmitted has an affect upon the efficiency of any computing system. In the domain of high performance computing, traditional frameworks like MPI have relied upon a relatively static type system with a high degree of a priori knowledge shared among the participants. However, modern scientific computing is increasingly distributed and dynamic, requiring the ability to dynamically create multi-platform workflows, to move processing to data, and to perform both in situ and streaming data analysis. Traditional approaches to data type description and communication in middleware, which typically either require a priori agreement on data types, or resort to highly inefficient representations like XML, are insufficient for the new domain of dynamic science. This paper describes a different approach, using FFS, a middleware library that implements efficient manipulation of application-level data. FFS provides for highly efficient binary data communication, XML-like examination of unknown data, and both third-party and in situ data processing via dynamic code generation. All of these capabilities are fully dynamic at run-time, without requiring a priori agreements or knowledge of the exact form of the data being communicated or analyzed.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126192155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Taxonomy of Multiscale Computing Communities 多尺度计算社区分类法
Pub Date : 2011-12-05 DOI: 10.1109/eScienceW.2011.11
D. Groen, S. Zasada, P. Coveney
We present a concise and comprehensive review of research communities which perform multiscale computing. We provide an overview of communities in a range of domains, and compare these communities to assess the level of use of multiscale methods in different research domains. Additionally, we characterize several areas where inter-disciplinary multiscale collaboration or the introduction of common and reusable methods could be particularly beneficial. We conclude that multiscale computing has become increasingly popular in recent years, that different communities adopt radically different organizational approaches, and that simulations on a length scale of a few metres and a time scale of a few hours can be found in many of the multiscale research domains. Sharing multiscale methods specifically geared towards these scales between communities may therefore be particularly beneficial.
我们提出了一个简洁和全面的审查研究社区执行多尺度计算。我们概述了一系列领域的社区,并比较了这些社区,以评估不同研究领域中多尺度方法的使用水平。此外,我们描述了跨学科多尺度合作或引入通用和可重用方法可能特别有益的几个领域。我们得出的结论是,近年来,多尺度计算变得越来越流行,不同的社区采用了截然不同的组织方法,并且在许多多尺度研究领域中可以找到几米长度尺度和几小时时间尺度的模拟。因此,在社区之间共享专门针对这些尺度的多尺度方法可能特别有益。
{"title":"Taxonomy of Multiscale Computing Communities","authors":"D. Groen, S. Zasada, P. Coveney","doi":"10.1109/eScienceW.2011.11","DOIUrl":"https://doi.org/10.1109/eScienceW.2011.11","url":null,"abstract":"We present a concise and comprehensive review of research communities which perform multiscale computing. We provide an overview of communities in a range of domains, and compare these communities to assess the level of use of multiscale methods in different research domains. Additionally, we characterize several areas where inter-disciplinary multiscale collaboration or the introduction of common and reusable methods could be particularly beneficial. We conclude that multiscale computing has become increasingly popular in recent years, that different communities adopt radically different organizational approaches, and that simulations on a length scale of a few metres and a time scale of a few hours can be found in many of the multiscale research domains. Sharing multiscale methods specifically geared towards these scales between communities may therefore be particularly beneficial.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116313540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2011 IEEE Seventh International Conference on e-Science Workshops
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1