首页 > 最新文献

2019 15th International Conference on eScience (eScience)最新文献

英文 中文
Accelerating Scientific Discovery with SCAIGATE Science Gateway 使用scigate科学门户加速科学发现
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00085
C. Jiang, David Ojika, Bhavesh Patel, A. Gordon-Ross, H. Lam
The demand for computational accelerators (GPUs, FPGAs, ASICs, etc.) is growing due to the widening variety of datacenter applications fueled by recent scientific breakthroughs that leverage artificial intelligence (AI). As much as these applications (e.g., cosmology, physics, etc.) have continued to witness record-breaking accuracy in predictive capabilities due to AI widespread influence, the infrastructure and workflow to take these applications out of research labs into production and business use-cases continues to lag. To address these important infrastructural challenges, we present SCAIGATE, a prototype science gateway with a simplified workflow aimed at facilitating model building/validation workflows in large-scale scientific applications.
由于最近利用人工智能(AI)的科学突破推动了各种数据中心应用的不断扩大,对计算加速器(gpu, fpga, asic等)的需求正在增长。由于人工智能的广泛影响,这些应用(如宇宙学、物理学等)在预测能力方面继续保持着破纪录的准确性,但将这些应用从研究实验室引入生产和业务用例的基础设施和工作流程仍然滞后。为了解决这些重要的基础设施挑战,我们提出了SCAIGATE,一个具有简化工作流程的原型科学网关,旨在促进大规模科学应用中的模型构建/验证工作流程。
{"title":"Accelerating Scientific Discovery with SCAIGATE Science Gateway","authors":"C. Jiang, David Ojika, Bhavesh Patel, A. Gordon-Ross, H. Lam","doi":"10.1109/eScience.2019.00085","DOIUrl":"https://doi.org/10.1109/eScience.2019.00085","url":null,"abstract":"The demand for computational accelerators (GPUs, FPGAs, ASICs, etc.) is growing due to the widening variety of datacenter applications fueled by recent scientific breakthroughs that leverage artificial intelligence (AI). As much as these applications (e.g., cosmology, physics, etc.) have continued to witness record-breaking accuracy in predictive capabilities due to AI widespread influence, the infrastructure and workflow to take these applications out of research labs into production and business use-cases continues to lag. To address these important infrastructural challenges, we present SCAIGATE, a prototype science gateway with a simplified workflow aimed at facilitating model building/validation workflows in large-scale scientific applications.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"49 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113937005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
EDISON Data Science Framework (EDSF) Extension to Address Transversal Skills Required by Emerging Industry 4.0 Transformation EDISON数据科学框架(EDSF)扩展以解决新兴工业4.0转型所需的横向技能
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00076
Y. Demchenko, T. Wiktorski, J. Cuadrado-Gallego, Steve Brewer
The emerging data-driven economy (also defined as Industry 4.0 or simply 4IR), encompassing industry, research and business, requires new types of specialists that are able to support all stages of the data lifecycle from data production and input, to data processing and actionable results delivery, visualisation and reporting, which can be collectively defined as the Data Science family of professions. Data Science as a research and academic discipline provides a basis for Data Analytics and ML/AI applications. The education and training of the data related professions must reflect all multi-disciplinary knowledge and competences that are required from the Data Science and handling practitioners in modern, data-driven research and the digital economy. In the modern era, with ever faster technology changes, matched by strong skills demand, the Data Science education and training programme should be customizable and deliverable in multiple forms, tailored for different categories of professional roles and profiles. Referring to other publications by the authors on building customizable and interoperable Data Science curricula for different types of learners and target application domains, this paper is focused on defining a set of transversal competences and skills that are required from modern and future Data Science professions. These include workplace and professional skills that cover critical thinking, problem solving, and creativity required to work in highly automated and dynamic environment. The proposed approach is based on the EDISON Data Science Framework (EDSF) initially developed within the EU funded Project EDISON and currently being further developed in the EU funded MATES project and also the FAIRsFAIR projects.
新兴的数据驱动经济(也被定义为工业4.0或简单的4IR),涵盖工业,研究和商业,需要能够支持数据生命周期各个阶段的新型专家,从数据生产和输入,到数据处理和可操作的结果交付,可视化和报告,这些可以统称为数据科学专业家族。数据科学作为一门研究和学术学科,为数据分析和ML/AI应用提供了基础。数据相关专业的教育和培训必须反映现代数据驱动研究和数字经济中数据科学和处理从业者所需的所有多学科知识和能力。在当今时代,随着技术变革的加快,与强大的技能需求相匹配,数据科学教育和培训计划应该以多种形式进行定制和交付,为不同类别的专业角色和概况量身定制。参考作者关于为不同类型的学习者和目标应用领域构建可定制和可互操作的数据科学课程的其他出版物,本文的重点是定义一套现代和未来数据科学专业所需的横向能力和技能。这些技能包括工作场所和专业技能,包括在高度自动化和动态的环境中工作所需的批判性思维、解决问题和创造力。拟议的方法是基于EDISON数据科学框架(EDSF),该框架最初是在欧盟资助的EDISON项目中开发的,目前正在欧盟资助的MATES项目和FAIRsFAIR项目中进一步开发。
{"title":"EDISON Data Science Framework (EDSF) Extension to Address Transversal Skills Required by Emerging Industry 4.0 Transformation","authors":"Y. Demchenko, T. Wiktorski, J. Cuadrado-Gallego, Steve Brewer","doi":"10.1109/eScience.2019.00076","DOIUrl":"https://doi.org/10.1109/eScience.2019.00076","url":null,"abstract":"The emerging data-driven economy (also defined as Industry 4.0 or simply 4IR), encompassing industry, research and business, requires new types of specialists that are able to support all stages of the data lifecycle from data production and input, to data processing and actionable results delivery, visualisation and reporting, which can be collectively defined as the Data Science family of professions. Data Science as a research and academic discipline provides a basis for Data Analytics and ML/AI applications. The education and training of the data related professions must reflect all multi-disciplinary knowledge and competences that are required from the Data Science and handling practitioners in modern, data-driven research and the digital economy. In the modern era, with ever faster technology changes, matched by strong skills demand, the Data Science education and training programme should be customizable and deliverable in multiple forms, tailored for different categories of professional roles and profiles. Referring to other publications by the authors on building customizable and interoperable Data Science curricula for different types of learners and target application domains, this paper is focused on defining a set of transversal competences and skills that are required from modern and future Data Science professions. These include workplace and professional skills that cover critical thinking, problem solving, and creativity required to work in highly automated and dynamic environment. The proposed approach is based on the EDISON Data Science Framework (EDSF) initially developed within the EU funded Project EDISON and currently being further developed in the EU funded MATES project and also the FAIRsFAIR projects.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130268761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Support for HTCondor high-Throughput Computing Workflows in the REANA Reusable Analysis Platform REANA可重用分析平台支持HTCondor高吞吐量计算工作流
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00091
Rokas Maciulaitis, T. Simko, P. Brenner, Scott S. Hampton, M. Hildreth, K. H. Anampa, Irena Johnson, Cody Kankel, Jan Okraska, D. Rodríguez
REANA is a reusable and reproducible data analysis platform allowing researchers to structure their analysis pipelines and run them on remote containerised compute clouds. REANA supports several different workflows systems (CWL, Serial, Yadage) and uses Kubernetes' job execution backend. We have designed an abstract job execution component that extends the REANA platform job execution capabilities to support multiple compute backends. We have tested the abstract job execution component with HTCondor and verified the scalability of the designed solution. The results show that the REANA platform would be able to support hybrid scientific workflows where different parts of the analysis pipelines can be executed on multiple computing backends.
REANA是一个可重用和可重复的数据分析平台,允许研究人员构建他们的分析管道,并在远程容器化计算云上运行它们。REANA支持几种不同的工作流系统(CWL, Serial, Yadage),并使用Kubernetes的作业执行后端。我们设计了一个抽象的作业执行组件,它扩展了REANA平台的作业执行能力,以支持多个计算后端。我们使用HTCondor对抽象作业执行组件进行了测试,验证了所设计解决方案的可扩展性。结果表明,REANA平台将能够支持混合科学工作流,其中分析管道的不同部分可以在多个计算后端执行。
{"title":"Support for HTCondor high-Throughput Computing Workflows in the REANA Reusable Analysis Platform","authors":"Rokas Maciulaitis, T. Simko, P. Brenner, Scott S. Hampton, M. Hildreth, K. H. Anampa, Irena Johnson, Cody Kankel, Jan Okraska, D. Rodríguez","doi":"10.1109/eScience.2019.00091","DOIUrl":"https://doi.org/10.1109/eScience.2019.00091","url":null,"abstract":"REANA is a reusable and reproducible data analysis platform allowing researchers to structure their analysis pipelines and run them on remote containerised compute clouds. REANA supports several different workflows systems (CWL, Serial, Yadage) and uses Kubernetes' job execution backend. We have designed an abstract job execution component that extends the REANA platform job execution capabilities to support multiple compute backends. We have tested the abstract job execution component with HTCondor and verified the scalability of the designed solution. The results show that the REANA platform would be able to support hybrid scientific workflows where different parts of the analysis pipelines can be executed on multiple computing backends.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131402793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Modeling and Matching Digital Data Marketplace Policies 建模和匹配数字数据市场政策
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00078
Sara Shakeri, Valentina Maccatrozzo, L. Veen, R. Bakhshi, L. Gommans, C. D. Laat, P. Grosso
Recently, Digital Data Marketplaces (DDMs) are gaining wide attention as a sharing platform among different organizations. That is due to the fact that sharing the information and participating in research collaborations play an important role in addressing multiple scientific challenges. To increase trust among participating organizations multiple contracts and agreements should be established in order to determine regulations and policies about who has access to what. Describing these agreements in a general model to be applicable in different DDMs is of utmost importance. In this paper, we present a semantic model for describing the access policies by means of semantic web technologies. In particular, we use and extend the Open Digital Rights Language (ODRL) to describe the pre-established agreements in a DDM.
最近,数字数据市场(ddm)作为不同组织之间的共享平台而受到广泛关注。这是因为共享信息和参与研究合作在应对多种科学挑战方面发挥着重要作用。为了增加参与组织之间的信任,应该建立多种合同和协议,以确定关于谁可以访问哪些内容的法规和政策。在通用模型中描述这些协议以适用于不同的ddm是至关重要的。本文利用语义web技术,提出了一个描述访问策略的语义模型。特别地,我们使用并扩展了开放数字权利语言(ODRL)来描述DDM中预先建立的协议。
{"title":"Modeling and Matching Digital Data Marketplace Policies","authors":"Sara Shakeri, Valentina Maccatrozzo, L. Veen, R. Bakhshi, L. Gommans, C. D. Laat, P. Grosso","doi":"10.1109/eScience.2019.00078","DOIUrl":"https://doi.org/10.1109/eScience.2019.00078","url":null,"abstract":"Recently, Digital Data Marketplaces (DDMs) are gaining wide attention as a sharing platform among different organizations. That is due to the fact that sharing the information and participating in research collaborations play an important role in addressing multiple scientific challenges. To increase trust among participating organizations multiple contracts and agreements should be established in order to determine regulations and policies about who has access to what. Describing these agreements in a general model to be applicable in different DDMs is of utmost importance. In this paper, we present a semantic model for describing the access policies by means of semantic web technologies. In particular, we use and extend the Open Digital Rights Language (ODRL) to describe the pre-established agreements in a DDM.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134628264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SATVAM: Toward an IoT Cyber-Infrastructure for Low-Cost Urban Air Quality Monitoring SATVAM:面向低成本城市空气质量监测的物联网网络基础设施
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00014
Yogesh L. Simmhan, M. Hegde, Rajesh Zele, S. Tripathi, S. Nair, S. Monga, R. Sahu, Kuldeep Dixit, R. Sutaria, Brijesh Mishra, Anamika Sharma, A. Svr
Air pollution is a public health emergency in large cities. The availability of commodity sensors and the advent of Internet of Things (IoT) enable the deployment of a city-wide network of 1000's of low-cost real-time air quality monitors to help manage this challenge. This needs to be supported by an IoT cyber-infrastructure for reliable and scalable data acquisition from the edge to the Cloud. The low accuracy of such sensors also motivates the need for data-driven calibration models that can accurately predict the science variables from the raw sensor signals. Here, we offer our experiences with designing and deploying such an IoT software platform and calibration models, and validate it through a pilot field deployment at two mega-cities, Delhi and Mumbai. Our edge data service is able to even-out the differential bandwidths from the sensing devices and to the Cloud repository, and recover from transient failures. Our analytical models reduce the errors of the sensors from a best-case of 63% using the factory baseline to as low as 21%, and substantially advances the state-of-the-art in this domain.
空气污染是大城市的突发公共卫生事件。商品传感器的可用性和物联网(IoT)的出现使得部署全市范围内的1000个低成本实时空气质量监测仪网络能够帮助应对这一挑战。这需要物联网网络基础设施的支持,以实现从边缘到云的可靠和可扩展的数据采集。这种传感器的低精度也激发了对数据驱动的校准模型的需求,这些模型可以从原始传感器信号中准确地预测科学变量。在这里,我们提供了我们在设计和部署这样一个物联网软件平台和校准模型方面的经验,并通过在德里和孟买两个大城市的试点现场部署来验证它。我们的边缘数据服务能够平衡来自传感设备和云存储库的差异带宽,并从瞬态故障中恢复。我们的分析模型将传感器的误差从使用工厂基线的最佳情况下的63%降低到低至21%,并大大提高了该领域的最先进水平。
{"title":"SATVAM: Toward an IoT Cyber-Infrastructure for Low-Cost Urban Air Quality Monitoring","authors":"Yogesh L. Simmhan, M. Hegde, Rajesh Zele, S. Tripathi, S. Nair, S. Monga, R. Sahu, Kuldeep Dixit, R. Sutaria, Brijesh Mishra, Anamika Sharma, A. Svr","doi":"10.1109/eScience.2019.00014","DOIUrl":"https://doi.org/10.1109/eScience.2019.00014","url":null,"abstract":"Air pollution is a public health emergency in large cities. The availability of commodity sensors and the advent of Internet of Things (IoT) enable the deployment of a city-wide network of 1000's of low-cost real-time air quality monitors to help manage this challenge. This needs to be supported by an IoT cyber-infrastructure for reliable and scalable data acquisition from the edge to the Cloud. The low accuracy of such sensors also motivates the need for data-driven calibration models that can accurately predict the science variables from the raw sensor signals. Here, we offer our experiences with designing and deploying such an IoT software platform and calibration models, and validate it through a pilot field deployment at two mega-cities, Delhi and Mumbai. Our edge data service is able to even-out the differential bandwidths from the sensing devices and to the Cloud repository, and recover from transient failures. Our analytical models reduce the errors of the sensors from a best-case of 63% using the factory baseline to as low as 21%, and substantially advances the state-of-the-art in this domain.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129609615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Towards a Computer-Interpretable Actionable Formal Model to Encode Data Governance Rules 面向数据治理规则编码的计算机可解释可操作形式化模型
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00082
Rui Zhao, M. Atkinson
With the needs of science and business, data sharing and re-use has become an intensive activity for various areas. In many cases, governance imposes rules concerning data use, but there is no existing computational technique to help data-users comply with such rules. We argue that intelligent systems can be used to improve the situation, by recording provenance records during processing, encoding the rules and performing reasoning. We present our initial work, designing formal models for data rules and flow rules and the reasoning system, as the first step towards helping data providers and data users sustain productive relationships.
随着科学和商业的需要,数据共享和重用已经成为各个领域的密集活动。在许多情况下,治理规定了有关数据使用的规则,但是没有现有的计算技术来帮助数据用户遵守这些规则。我们认为,智能系统可以通过在处理过程中记录来源记录、编码规则和执行推理来改善这种情况。我们介绍了我们的初步工作,为数据规则、流规则和推理系统设计正式模型,作为帮助数据提供者和数据用户维持有效关系的第一步。
{"title":"Towards a Computer-Interpretable Actionable Formal Model to Encode Data Governance Rules","authors":"Rui Zhao, M. Atkinson","doi":"10.1109/eScience.2019.00082","DOIUrl":"https://doi.org/10.1109/eScience.2019.00082","url":null,"abstract":"With the needs of science and business, data sharing and re-use has become an intensive activity for various areas. In many cases, governance imposes rules concerning data use, but there is no existing computational technique to help data-users comply with such rules. We argue that intelligent systems can be used to improve the situation, by recording provenance records during processing, encoding the rules and performing reasoning. We present our initial work, designing formal models for data rules and flow rules and the reasoning system, as the first step towards helping data providers and data users sustain productive relationships.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132546605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Timing is Everything: Identifying Diverse Interaction Dynamics in Scenario and Non-Scenario Meetings 时间决定一切:在情景会议和非情景会议中识别不同的互动动态
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00029
Chreston A. Miller, Christa Miller
In this paper we explore the use of temporal patterns to define interaction dynamics between different kinds of meetings. Meetings occur on a daily basis and include different behavioral dynamics between participants, such as floor shifts and intense dialog. These dynamics can tell a story of the meeting and provide insight into how participants interact. We focus our investigation on defining diversity metrics to compare the interaction dynamics of scenario and non-scenario meetings. These metrics may be able to provide insight into the similarities and differences between scenario and non-scenario meetings. We observe that certain interaction dynamics can be identified through temporal patterns of speech intervals, i.e., when a participant is talking. We apply the principles of Parallel Episodes in identifying moments of speech overlap, e.g., interaction "bursts", and introduce Situated Data Mining, an approach for identifying repeated behavior patterns based on situated context. Applying these algorithms provides an overview of certain meeting dynamics and defines metrics for meeting comparison and diversity of interaction. We tested on a subset of the AMI corpus and developed three diversity metrics to describe similarities and differences between meetings. These metrics also present the researcher with an overview of interaction dynamics and presents points-of-interest for analysis.
在本文中,我们探讨了使用时间模式来定义不同类型会议之间的交互动态。会议每天都在进行,包括参与者之间不同的行为动态,如楼层轮换和激烈的对话。这些动态可以讲述会议的故事,并提供参与者如何互动的见解。我们的研究重点是定义多样性指标,以比较情景会议和非情景会议的互动动态。这些量度可能能够提供对场景会议和非场景会议之间的异同的洞察。我们观察到,某些互动动态可以通过言语间隔的时间模式来识别,即当参与者说话时。我们将平行情节的原则应用于识别语音重叠的时刻,例如,交互“爆发”,并引入情境数据挖掘,一种基于情境上下文识别重复行为模式的方法。应用这些算法提供了某些会议动态的概述,并定义了会议比较和交互多样性的度量。我们在AMI语料库的一个子集上进行了测试,并开发了三个多样性指标来描述会议之间的相似性和差异性。这些指标还向研究人员展示了交互动力学的概述,并提出了分析的兴趣点。
{"title":"Timing is Everything: Identifying Diverse Interaction Dynamics in Scenario and Non-Scenario Meetings","authors":"Chreston A. Miller, Christa Miller","doi":"10.1109/eScience.2019.00029","DOIUrl":"https://doi.org/10.1109/eScience.2019.00029","url":null,"abstract":"In this paper we explore the use of temporal patterns to define interaction dynamics between different kinds of meetings. Meetings occur on a daily basis and include different behavioral dynamics between participants, such as floor shifts and intense dialog. These dynamics can tell a story of the meeting and provide insight into how participants interact. We focus our investigation on defining diversity metrics to compare the interaction dynamics of scenario and non-scenario meetings. These metrics may be able to provide insight into the similarities and differences between scenario and non-scenario meetings. We observe that certain interaction dynamics can be identified through temporal patterns of speech intervals, i.e., when a participant is talking. We apply the principles of Parallel Episodes in identifying moments of speech overlap, e.g., interaction \"bursts\", and introduce Situated Data Mining, an approach for identifying repeated behavior patterns based on situated context. Applying these algorithms provides an overview of certain meeting dynamics and defines metrics for meeting comparison and diversity of interaction. We tested on a subset of the AMI corpus and developed three diversity metrics to describe similarities and differences between meetings. These metrics also present the researcher with an overview of interaction dynamics and presents points-of-interest for analysis.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121304260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Toward an Elastic Data Transfer Infrastructure 迈向弹性数据传输基础设施
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00036
Joaquín Chung, Zhengchun Liu, R. Kettimuthu, Ian T Foster
Data transfer over wide area networks is an integral part of many science workflows that must, for example, move data from scientific facilities to remote resources for analysis, sharing, and storage. Yet despite continued enhancements in data transfer infrastructure (DTI), our previous analyses of approximately 40 billion GridFTP command logs collected over four years from the Globus transfer service show that data transfer nodes (DTNs) are idle (i.e., are performing no transfers) 94.3% of the time. On the other hand, we have also observed periods in which CPU resource scarcity negatively impacts DTN throughput. Motivated by the opportunity to optimize DTI performance, we present here an elastic DTI architecture in which the pool of nodes allocated to DTN activities expands and shrinks over time, based on demand. Our results show that this elastic DTI can save up to ~95% of resources compared with a typical static DTN deployment, with the median slowdown incurred remaining close to one for most of the evaluated scenarios.
广域网上的数据传输是许多科学工作流程的一个组成部分,例如,必须将数据从科学设施移动到远程资源以进行分析、共享和存储。然而,尽管数据传输基础设施(DTI)不断增强,我们之前对四年来从Globus传输服务收集的大约400亿个GridFTP命令日志的分析表明,数据传输节点(dtn)在94.3%的时间里是空闲的(即不执行传输)。另一方面,我们还观察到CPU资源稀缺性对DTN吞吐量产生负面影响的时期。在优化DTI性能的机会的激励下,我们在这里提出了一个弹性DTI体系结构,其中分配给DTN活动的节点池根据需求随着时间的推移而扩展和缩小。我们的结果表明,与典型的静态DTN部署相比,这种弹性DTI可以节省高达95%的资源,在大多数评估场景中,所产生的中位数减速仍然接近1。
{"title":"Toward an Elastic Data Transfer Infrastructure","authors":"Joaquín Chung, Zhengchun Liu, R. Kettimuthu, Ian T Foster","doi":"10.1109/eScience.2019.00036","DOIUrl":"https://doi.org/10.1109/eScience.2019.00036","url":null,"abstract":"Data transfer over wide area networks is an integral part of many science workflows that must, for example, move data from scientific facilities to remote resources for analysis, sharing, and storage. Yet despite continued enhancements in data transfer infrastructure (DTI), our previous analyses of approximately 40 billion GridFTP command logs collected over four years from the Globus transfer service show that data transfer nodes (DTNs) are idle (i.e., are performing no transfers) 94.3% of the time. On the other hand, we have also observed periods in which CPU resource scarcity negatively impacts DTN throughput. Motivated by the opportunity to optimize DTI performance, we present here an elastic DTI architecture in which the pool of nodes allocated to DTN activities expands and shrinks over time, based on demand. Our results show that this elastic DTI can save up to ~95% of resources compared with a typical static DTN deployment, with the median slowdown incurred remaining close to one for most of the evaluated scenarios.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"11 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130725156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Workflow Automation in Liquid Chromatography Mass Spectrometry 液相色谱-质谱分析中的工作流程自动化
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00095
R. Gentz, H. Martín, Edward Baidoo, S. Peisert
We describe the fully automated workflow path developed for the ingest and analysis of liquid chromatography mass spectrometry (LCMS) data. With the help of this computational workflow, we were able to replace two human work days to analyze data with two hours of unsupervised computation time. In addition, this tool also can compute confidence intervals for all its results, based on the noise level present in the data. We leverage only open source tools and libraries in this workflow.
我们描述了为液相色谱质谱(LCMS)数据的摄取和分析开发的全自动工作流程路径。在这个计算工作流程的帮助下,我们能够用两个小时的无监督计算时间取代两个工作日来分析数据。此外,该工具还可以根据数据中存在的噪声水平计算其所有结果的置信区间。我们在这个工作流中只利用开源工具和库。
{"title":"Workflow Automation in Liquid Chromatography Mass Spectrometry","authors":"R. Gentz, H. Martín, Edward Baidoo, S. Peisert","doi":"10.1109/eScience.2019.00095","DOIUrl":"https://doi.org/10.1109/eScience.2019.00095","url":null,"abstract":"We describe the fully automated workflow path developed for the ingest and analysis of liquid chromatography mass spectrometry (LCMS) data. With the help of this computational workflow, we were able to replace two human work days to analyze data with two hours of unsupervised computation time. In addition, this tool also can compute confidence intervals for all its results, based on the noise level present in the data. We leverage only open source tools and libraries in this workflow.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125993935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Teaching DevOps and Cloud Based Software Engineering in University Curricula 大学课程中DevOps和基于云的软件工程教学
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00075
Y. Demchenko, Zhiming Zhao, Jayachander Surbiryala, Spiros Koulouzis, Zeshun Shi, X. Liao, Jelena Gordiyenko
This paper presents recommendations on the design and pilot implementation of the DevOps and Cloud based Software Development curricula for Computer Science and Software Engineering masters. The central part of proposed approach is the Body of Knowledge in the DevOps technologies for Software Engineering (DevOpsSE BoK) that defines a set Knowledge Areas and Knowledge Units required for SE professionals to work efficiently as DevOps engineer or application developer. Defining DevOpsSE-BoK provides a basis for defining required professional competences and skills and allows consistent curricula structuring and profiling. The paper also reports on the experience of the first course run on 2018/2019 academic year at the University of Amsterdam. The paper presents the structure of the course and explains what instructional methodologies have been used for course development, such as project based learning that facilitates the students' team based skills both in mastering Agile development process and skills sharing. The paper provides a short summary of the generally used DevOps definitions, concepts, models and tools, specifically focusing on the cloud based DevOps tools for software development, deployment and operation that allows the main DevOps principle of continuous development and continuous improvement which are critical for modern agile data driven companies.
本文提出了针对计算机科学和软件工程硕士的DevOps和基于云的软件开发课程的设计和试点实施的建议。建议方法的核心部分是软件工程DevOps技术中的知识体系(DevOpsSE BoK),它定义了一组知识领域和知识单元,这些知识领域和单元是软件工程专业人员作为DevOps工程师或应用程序开发人员有效工作所必需的。定义DevOpsSE-BoK为定义所需的专业能力和技能提供了基础,并允许一致的课程结构和分析。本文还报告了阿姆斯特丹大学2018/2019学年开设的第一门课程的经验。本文介绍了课程的结构,并解释了课程开发中使用的教学方法,例如基于项目的学习,促进了学生在掌握敏捷开发过程和技能共享方面的基于团队的技能。本文简要总结了常用的DevOps定义、概念、模型和工具,特别关注了用于软件开发、部署和操作的基于云的DevOps工具,这些工具支持持续开发和持续改进的主要DevOps原则,这对现代敏捷数据驱动型公司至关重要。
{"title":"Teaching DevOps and Cloud Based Software Engineering in University Curricula","authors":"Y. Demchenko, Zhiming Zhao, Jayachander Surbiryala, Spiros Koulouzis, Zeshun Shi, X. Liao, Jelena Gordiyenko","doi":"10.1109/eScience.2019.00075","DOIUrl":"https://doi.org/10.1109/eScience.2019.00075","url":null,"abstract":"This paper presents recommendations on the design and pilot implementation of the DevOps and Cloud based Software Development curricula for Computer Science and Software Engineering masters. The central part of proposed approach is the Body of Knowledge in the DevOps technologies for Software Engineering (DevOpsSE BoK) that defines a set Knowledge Areas and Knowledge Units required for SE professionals to work efficiently as DevOps engineer or application developer. Defining DevOpsSE-BoK provides a basis for defining required professional competences and skills and allows consistent curricula structuring and profiling. The paper also reports on the experience of the first course run on 2018/2019 academic year at the University of Amsterdam. The paper presents the structure of the course and explains what instructional methodologies have been used for course development, such as project based learning that facilitates the students' team based skills both in mastering Agile development process and skills sharing. The paper provides a short summary of the generally used DevOps definitions, concepts, models and tools, specifically focusing on the cloud based DevOps tools for software development, deployment and operation that allows the main DevOps principle of continuous development and continuous improvement which are critical for modern agile data driven companies.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124418813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
2019 15th International Conference on eScience (eScience)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1