首页 > 最新文献

2018 IEEE 14th International Conference on e-Science (e-Science)最新文献

英文 中文
Remote Cloud-Based Automated Stroke Rehabilitation Assessment Using Wearables 使用可穿戴设备的远程云端自动中风康复评估
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00063
Shane Halloran, J. Shi, Yu Guan, Xi Chen, Michael Dunne-Willows, J. Eyre
We outline a system enabling accurate remote assessment of stroke rehabilitation levels using wrist worn accelerometer time series data. The system is built based on features generated from clustering models across sliding windows in the data and makes use of computation in the cloud. Predictive models are built using advanced machine learning techniques.
我们概述了一个系统,可以使用手腕上佩戴的加速度计时间序列数据准确地远程评估中风康复水平。该系统基于数据中跨滑动窗口的聚类模型生成的特征,并利用云计算。预测模型是使用先进的机器学习技术建立的。
{"title":"Remote Cloud-Based Automated Stroke Rehabilitation Assessment Using Wearables","authors":"Shane Halloran, J. Shi, Yu Guan, Xi Chen, Michael Dunne-Willows, J. Eyre","doi":"10.1109/eScience.2018.00063","DOIUrl":"https://doi.org/10.1109/eScience.2018.00063","url":null,"abstract":"We outline a system enabling accurate remote assessment of stroke rehabilitation levels using wrist worn accelerometer time series data. The system is built based on features generated from clustering models across sliding windows in the data and makes use of computation in the cloud. Predictive models are built using advanced machine learning techniques.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"47 1","pages":"302-302"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83615575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Utilizing a Transparency-Driven Environment Toward Trusted Automatic Genre Classification: A Case Study in Journalism History 利用透明驱动的环境实现可信的自动体裁分类:新闻史案例研究
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00137
A. Bilgin, L. Hollink, J. V. Ossenbruggen, E. T. K. Sang, Kim Smeenk, Frank Harbers, M. Broersma
With the growing abundance of unlabeled data in real-world tasks, researchers have to rely on the predictions given by black-boxed computational models. However, it is an often neglected fact that these models may be scoring high on accuracy for the wrong reasons. In this paper, we present a practical impact analysis of enabling model transparency by various presentation forms. For this purpose, we developed an environment that empowers non-computer scientists to become practicing data scientists in their own research field. We demonstrate the gradually increasing understanding of journalism historians through a real-world use case study on automatic genre classification of newspaper articles. This study is a first step towards trusted usage of machine learning pipelines in a responsible way.
随着现实世界任务中未标记数据的日益增多,研究人员不得不依赖于黑箱计算模型给出的预测。然而,一个经常被忽视的事实是,这些模型可能因为错误的原因而在准确性上得分很高。在本文中,我们提出了通过各种表示形式实现模型透明度的实际影响分析。为此,我们开发了一个环境,使非计算机科学家能够在自己的研究领域成为实践数据科学家。我们通过对报纸文章自动体裁分类的真实案例研究,展示了新闻历史学家逐渐增加的理解。这项研究是以负责任的方式可靠地使用机器学习管道的第一步。
{"title":"Utilizing a Transparency-Driven Environment Toward Trusted Automatic Genre Classification: A Case Study in Journalism History","authors":"A. Bilgin, L. Hollink, J. V. Ossenbruggen, E. T. K. Sang, Kim Smeenk, Frank Harbers, M. Broersma","doi":"10.1109/eScience.2018.00137","DOIUrl":"https://doi.org/10.1109/eScience.2018.00137","url":null,"abstract":"With the growing abundance of unlabeled data in real-world tasks, researchers have to rely on the predictions given by black-boxed computational models. However, it is an often neglected fact that these models may be scoring high on accuracy for the wrong reasons. In this paper, we present a practical impact analysis of enabling model transparency by various presentation forms. For this purpose, we developed an environment that empowers non-computer scientists to become practicing data scientists in their own research field. We demonstrate the gradually increasing understanding of journalism historians through a real-world use case study on automatic genre classification of newspaper articles. This study is a first step towards trusted usage of machine learning pipelines in a responsible way.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"9 1","pages":"486-496"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83536962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Power Asymmetries of eHumanities Infrastructures 人文基础设施的权力不对称
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00103
Max Kemman
Digital research infrastructures simultaneously enable and confine the research practices of scholars, constituting a power relation. This power relation can be characterised as a power asymmetry, with scholars dependent on the developers of infrastructures. In order to reduce this power asymmetry, infrastructures are developed in collaboration between scholars and computational researchers. Through an analysis of over twenty interviews, I will investigate the role of knowledge asymmetry, the ignorance of how a collaborator performs their tasks, and how this relates to power asymmetry in eScience collaborations in digital history. I will moreover consider how these asymmetries pose a challenge in the development and adoption of research infrastructures in the humanities.
数字化研究基础设施在促进和制约学者研究实践的同时,构成了一种权力关系。这种权力关系可以被描述为一种权力不对称,学者依赖于基础设施的开发商。为了减少这种权力不对称,基础设施是由学者和计算研究人员合作开发的。通过对20多个访谈的分析,我将调查知识不对称的作用,对合作者如何执行任务的无知,以及这与数字历史中eScience合作中的权力不对称的关系。此外,我将考虑这些不对称如何对人文学科研究基础设施的发展和采用构成挑战。
{"title":"Power Asymmetries of eHumanities Infrastructures","authors":"Max Kemman","doi":"10.1109/eScience.2018.00103","DOIUrl":"https://doi.org/10.1109/eScience.2018.00103","url":null,"abstract":"Digital research infrastructures simultaneously enable and confine the research practices of scholars, constituting a power relation. This power relation can be characterised as a power asymmetry, with scholars dependent on the developers of infrastructures. In order to reduce this power asymmetry, infrastructures are developed in collaboration between scholars and computational researchers. Through an analysis of over twenty interviews, I will investigate the role of knowledge asymmetry, the ignorance of how a collaborator performs their tasks, and how this relates to power asymmetry in eScience collaborations in digital history. I will moreover consider how these asymmetries pose a challenge in the development and adoption of research infrastructures in the humanities.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"85 1","pages":"370-371"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83902118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Preserving Reproducibility: Provenance and Executable Containers in DataONE Data Packages 保持再现性:DataONE数据包中的来源和可执行容器
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00019
Bryce D. Mecum, Matthew B. Jones, D. Vieglais, C. Willis
Many data packaging standards are available to researchers and data repository operators and the choice to use an existing standard or create a new one is challenging. We introduce the DataONE Data Package standard which is based on the existing OAI-ORE Resource Map standard. We describe the functionality Data Package provides, implementation considerations, compare it to existing standards, and discuss future extensions to the standard including the ability to describe execution environments via WholeTale "Tales"" and alternate serialization formats.
研究人员和数据存储库操作人员可以使用许多数据打包标准,选择使用现有标准还是创建新标准具有挑战性。在现有OAI-ORE资源图标准的基础上,提出了DataONE数据包标准。我们描述了Data Package提供的功能,实现方面的考虑,将其与现有标准进行比较,并讨论了该标准的未来扩展,包括通过WholeTale“Tales”和替代序列化格式描述执行环境的能力。
{"title":"Preserving Reproducibility: Provenance and Executable Containers in DataONE Data Packages","authors":"Bryce D. Mecum, Matthew B. Jones, D. Vieglais, C. Willis","doi":"10.1109/eScience.2018.00019","DOIUrl":"https://doi.org/10.1109/eScience.2018.00019","url":null,"abstract":"Many data packaging standards are available to researchers and data repository operators and the choice to use an existing standard or create a new one is challenging. We introduce the DataONE Data Package standard which is based on the existing OAI-ORE Resource Map standard. We describe the functionality Data Package provides, implementation considerations, compare it to existing standards, and discuss future extensions to the standard including the ability to describe execution environments via WholeTale \"Tales\"\" and alternate serialization formats.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"16 1","pages":"45-49"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77125113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Development of the OMUSE/AMUSE Modeling System OMUSE/AMUSE建模系统的开发
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00105
F. I. Pelupessy, B. V. Werkhoven, G. Oord, S. Zwart, A. V. Elteren, H. Dijkstra
The Oceanographic Multipurpose Software Environment (OMUSE, [1]) is an open source framework developed for oceanographic and other earth system modelling applications. OMUSE provides a homogeneous environment to interface with numerical simulation codes. It was developed at the IMAU (Utrecht) using coupling technology developed for astrophysical applications in the AMUSE project at Leiden Observatory[2,3]. OMUSE simplifies the use and deployment of numerical simulations codes. Furthermore, the design of the OMUSE interfaces (figure 1) allow codes that represent different physics or span different ranges of physical scales to be easily combined in novel numerical experiments. The use cases for OMUSE range from running simple numerical experiments with single codes and the addition of data analysis tools in model runs, to setting up fairly complicated and strongly coupled solvers for problems that are intrinsically multi-scale and/or require different physics. Here, we will present the design of OMUSE as well as give examples of the types of the couplings that can be implemented using OMUSE. The example provided by AMUSE and OMUSE suggests that application of the same interfacing philosophy to a more extensive set of disciplines is possible. In order to facilitate this a better separation of the core framework and domain specific code is necessary. We will present ongoing work to support meteorological and hydrological applications and the use of the framework as the computational core in the eWatercycle project [4]. For this, adaptations are made to improve the interoperability with existing interface efforts (such as the BMI) and we discuss developments regarding the encapsulation of OMUSE/AMUSE and its component models in containers. This will facilitate the installation for first time users, removing a barrier in this respect. In addition to this we anticipate this to also offer more flexible deployment options for the framework.
海洋学多用途软件环境(OMUSE,[1])是为海洋学和其他地球系统建模应用开发的开源框架。OMUSE提供了一个与数值模拟代码接口的同质环境。它是由IMAU (Utrecht)利用在莱顿天文台的AMUSE项目中为天体物理应用开发的耦合技术开发的[2,3]。OMUSE简化了数值模拟代码的使用和部署。此外,OMUSE接口的设计(图1)允许在新颖的数值实验中轻松组合代表不同物理或跨越不同物理尺度范围的代码。OMUSE的用例范围从使用单个代码运行简单的数值实验和在模型运行中添加数据分析工具,到为本质上是多尺度和/或需要不同物理的问题设置相当复杂和强耦合的求解器。在这里,我们将介绍OMUSE的设计,并给出可以使用OMUSE实现的耦合类型的示例。AMUSE和OMUSE提供的例子表明,将相同的接口哲学应用于更广泛的学科是可能的。为了促进这一点,有必要更好地分离核心框架和特定领域的代码。我们将介绍正在进行的工作,以支持气象和水文应用,并在eWatercycle项目中使用框架作为计算核心[4]。为此,进行了一些调整以改进与现有接口(如BMI)的互操作性,我们讨论了有关在容器中封装OMUSE/AMUSE及其组件模型的开发。这将有助于首次用户的安装,消除这方面的障碍。除此之外,我们还希望为框架提供更灵活的部署选项。
{"title":"Development of the OMUSE/AMUSE Modeling System","authors":"F. I. Pelupessy, B. V. Werkhoven, G. Oord, S. Zwart, A. V. Elteren, H. Dijkstra","doi":"10.1109/eScience.2018.00105","DOIUrl":"https://doi.org/10.1109/eScience.2018.00105","url":null,"abstract":"The Oceanographic Multipurpose Software Environment (OMUSE, [1]) is an open source framework developed for oceanographic and other earth system modelling applications. OMUSE provides a homogeneous environment to interface with numerical simulation codes. It was developed at the IMAU (Utrecht) using coupling technology developed for astrophysical applications in the AMUSE project at Leiden Observatory[2,3]. OMUSE simplifies the use and deployment of numerical simulations codes. Furthermore, the design of the OMUSE interfaces (figure 1) allow codes that represent different physics or span different ranges of physical scales to be easily combined in novel numerical experiments. The use cases for OMUSE range from running simple numerical experiments with single codes and the addition of data analysis tools in model runs, to setting up fairly complicated and strongly coupled solvers for problems that are intrinsically multi-scale and/or require different physics. Here, we will present the design of OMUSE as well as give examples of the types of the couplings that can be implemented using OMUSE. The example provided by AMUSE and OMUSE suggests that application of the same interfacing philosophy to a more extensive set of disciplines is possible. In order to facilitate this a better separation of the core framework and domain specific code is necessary. We will present ongoing work to support meteorological and hydrological applications and the use of the framework as the computational core in the eWatercycle project [4]. For this, adaptations are made to improve the interoperability with existing interface efforts (such as the BMI) and we discuss developments regarding the encapsulation of OMUSE/AMUSE and its component models in containers. This will facilitate the installation for first time users, removing a barrier in this respect. In addition to this we anticipate this to also offer more flexible deployment options for the framework.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"50 1","pages":"374-374"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87023537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Navigating Sea-Ice Timeseries Data using Tracklines 使用轨道线导航海冰时间序列数据
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00115
Brennan Bell, T. Dinter, Vlad Merticariu, B. P. Huu, D. Misev, P. Baumann
Scientists are often interested in sampling buffered regions of data across multiple time-slices in array datacubes. For instance, in studying sea-ice distributions, a string of geographic coordinates with timestamps are requested, representing a sample or ship track line of a measurement campaign. A defined region is sampled around each of those data points using a nearestneighbour approach in time and a buffer or polygon clipping in the spatial domain. Objectively, such queries can be handled discretely across the time domain, as there is no temporal interpolation, and as a result, the tiling of extracted rasters is well-defined by the tiling of the source data. What happens when the resulting object should also be represented by a 3-D raster, such as in the case where the trackline consists of continuous buffered sampling across the timeseries? Spatio-temporal data is typically stored in chunked 3-D arrays, where multiple time-slices appear in the same "tile" or subarray. Unlike the discrete version, tracing out a polygonally-shaped buffer along a ship’s path in a 3-D spatio-temporal datacube leads to shearing across the spatial tiles in the result raster, and this shearing prevents an a priori tiling of the result. Here, we present several approaches to tiling the result raster, and we provide a mathematical investigation of the impact these approaches can have on performance. To substantiate the theoretical investigation, an implementation and performance benchmarks on the different tiling approaches are provided, and the implementation is demonstrated on sea-ice data as a casestudy. In future work, we discuss different approaches towards parallelization utilizing these techniques as a basis for thread-safety, establishing the results on arbitrary R+ trees and extending these results to R* trees.
科学家们经常对在数组数据中跨多个时间片采样数据的缓冲区域感兴趣。例如,在研究海冰分布时,需要一串带有时间戳的地理坐标,代表测量活动的样本或船舶轨迹线。在每个数据点周围使用时间上的最近邻方法和空间域中的缓冲或多边形裁剪来采样一个定义的区域。客观地说,这样的查询可以跨时间域离散地处理,因为没有时间插值,因此,提取的光栅的平铺是通过源数据的平铺来定义的。当结果对象也应该由3-D光栅表示时,例如在轨道线由跨时间序列的连续缓冲采样组成的情况下,会发生什么情况?时空数据通常存储在块三维数组中,其中多个时间片出现在相同的“块”或子数组中。与离散版本不同,在三维时空数据立方体中沿着船舶路径绘制多边形缓冲区会导致结果栅格中的空间瓦片被剪切,这种剪切会防止结果的先验瓦片。在这里,我们提出了几种将结果光栅平铺的方法,并对这些方法对性能的影响进行了数学研究。为了证实理论研究,提供了不同平铺方法的实施和性能基准,并在海冰数据上进行了案例研究。在未来的工作中,我们将讨论利用这些技术作为线程安全基础的不同并行化方法,在任意R+树上建立结果并将这些结果扩展到R*树。
{"title":"Navigating Sea-Ice Timeseries Data using Tracklines","authors":"Brennan Bell, T. Dinter, Vlad Merticariu, B. P. Huu, D. Misev, P. Baumann","doi":"10.1109/eScience.2018.00115","DOIUrl":"https://doi.org/10.1109/eScience.2018.00115","url":null,"abstract":"Scientists are often interested in sampling buffered regions of data across multiple time-slices in array datacubes. For instance, in studying sea-ice distributions, a string of geographic coordinates with timestamps are requested, representing a sample or ship track line of a measurement campaign. A defined region is sampled around each of those data points using a nearestneighbour approach in time and a buffer or polygon clipping in the spatial domain. Objectively, such queries can be handled discretely across the time domain, as there is no temporal interpolation, and as a result, the tiling of extracted rasters is well-defined by the tiling of the source data. What happens when the resulting object should also be represented by a 3-D raster, such as in the case where the trackline consists of continuous buffered sampling across the timeseries? Spatio-temporal data is typically stored in chunked 3-D arrays, where multiple time-slices appear in the same \"tile\" or subarray. Unlike the discrete version, tracing out a polygonally-shaped buffer along a ship’s path in a 3-D spatio-temporal datacube leads to shearing across the spatial tiles in the result raster, and this shearing prevents an a priori tiling of the result. Here, we present several approaches to tiling the result raster, and we provide a mathematical investigation of the impact these approaches can have on performance. To substantiate the theoretical investigation, an implementation and performance benchmarks on the different tiling approaches are provided, and the implementation is demonstrated on sea-ice data as a casestudy. In future work, we discuss different approaches towards parallelization utilizing these techniques as a basis for thread-safety, establishing the results on arbitrary R+ trees and extending these results to R* trees.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"14 1","pages":"392-392"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74725114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting Flood Maps from Social Media for Assimilation 从社交媒体中提取洪水地图进行同化
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00045
Etienne Brangbour, P. Bruneau, S. Marchand-Maillet
This abstract states the position of the Publimape project, and unveils progress achieved since its recent start.
这篇摘要陈述了Publimape项目的立场,并揭示了自最近开始以来所取得的进展。
{"title":"Extracting Flood Maps from Social Media for Assimilation","authors":"Etienne Brangbour, P. Bruneau, S. Marchand-Maillet","doi":"10.1109/eScience.2018.00045","DOIUrl":"https://doi.org/10.1109/eScience.2018.00045","url":null,"abstract":"This abstract states the position of the Publimape project, and unveils progress achieved since its recent start.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"13 1","pages":"272-273"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74096373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Big Provenance Stream Processing for Data Intensive Computations 用于数据密集计算的大来源流处理
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00039
Isuru Suriarachchi, S. Withana, Beth Plale
In the business and research landscape of today, data analysis consumes public and proprietary data from numerous sources, and utilizes any one or more of popular data-parallel frameworks such as Hadoop, Spark and Flink. In the Data Lake setting these frameworks co-exist. Our earlier work has shown that data provenance in Data Lakes can aid with both traceability and management. The sheer volume of fine-grained provenance generated in a multi-framework application motivates the need for on-the-fly provenance processing. We introduce a new parallel stream processing algorithm that reduces fine-grained provenance while preserving backward and forward provenance. The algorithm is resilient to provenance events arriving out-of-order. It is evaluated using several strategies for partitioning a provenance stream. The evaluation shows that the parallel algorithm performs well in processing out-of-order provenance streams, with good scalability and accuracy.
在当今的商业和研究领域,数据分析消耗来自众多来源的公共和专有数据,并利用任何一个或多个流行的数据并行框架,如Hadoop、Spark和Flink。在数据湖设置中,这些框架共存。我们早期的工作表明,数据湖中的数据来源可以帮助实现可追溯性和管理。在多框架应用程序中生成的大量细粒度来源激发了对动态来源处理的需求。我们引入了一种新的并行流处理算法,在保留向后和向前溯源的同时减少了细粒度的溯源。该算法对无序到达的来源事件具有弹性。它使用几种策略来划分一个来源流。仿真结果表明,该算法在处理乱序源流方面表现良好,具有良好的可扩展性和准确性。
{"title":"Big Provenance Stream Processing for Data Intensive Computations","authors":"Isuru Suriarachchi, S. Withana, Beth Plale","doi":"10.1109/eScience.2018.00039","DOIUrl":"https://doi.org/10.1109/eScience.2018.00039","url":null,"abstract":"In the business and research landscape of today, data analysis consumes public and proprietary data from numerous sources, and utilizes any one or more of popular data-parallel frameworks such as Hadoop, Spark and Flink. In the Data Lake setting these frameworks co-exist. Our earlier work has shown that data provenance in Data Lakes can aid with both traceability and management. The sheer volume of fine-grained provenance generated in a multi-framework application motivates the need for on-the-fly provenance processing. We introduce a new parallel stream processing algorithm that reduces fine-grained provenance while preserving backward and forward provenance. The algorithm is resilient to provenance events arriving out-of-order. It is evaluated using several strategies for partitioning a provenance stream. The evaluation shows that the parallel algorithm performs well in processing out-of-order provenance streams, with good scalability and accuracy.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"35 1","pages":"245-255"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75853693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A First Look at the JX Workflow Language JX工作流语言简介
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00094
Tim Shaffer, Kyle M. D. Sweeney, Nathaniel Kremer-Herman, D. Thain
Scientific workflows are typically expressed as a graph of logical tasks, each one representing a single program along with its input and output files. This poster introduces JX (JSON eXtended), a declarative language that can express complex workloads as an assembly of sub-graphs that can be partitioned in flexible ways. We present a case study of using JX to represent complex workflows for the Lifemapper biodiversity project. We evaluate partitioning approaches across several computing environments, including ND-Condor, IU-Jetstream, and SDSC-Comet, and show that a coarse partitioning results in faster turnaround times, reduced data transfer, and lower master utilization across all three systems.
科学工作流通常表示为逻辑任务的图,每个任务代表一个程序及其输入和输出文件。这张海报介绍了JX (JSON eXtended),这是一种声明性语言,可以将复杂的工作负载表达为一组子图,这些子图可以以灵活的方式进行分区。我们提出了一个使用JX表示Lifemapper生物多样性项目的复杂工作流程的案例研究。我们评估了跨多个计算环境(包括ND-Condor、IU-Jetstream和SDSC-Comet)的分区方法,并表明,在所有三个系统中,粗分区会导致更快的周转时间、更少的数据传输和更低的主利用率。
{"title":"A First Look at the JX Workflow Language","authors":"Tim Shaffer, Kyle M. D. Sweeney, Nathaniel Kremer-Herman, D. Thain","doi":"10.1109/eScience.2018.00094","DOIUrl":"https://doi.org/10.1109/eScience.2018.00094","url":null,"abstract":"Scientific workflows are typically expressed as a graph of logical tasks, each one representing a single program along with its input and output files. This poster introduces JX (JSON eXtended), a declarative language that can express complex workloads as an assembly of sub-graphs that can be partitioned in flexible ways. We present a case study of using JX to represent complex workflows for the Lifemapper biodiversity project. We evaluate partitioning approaches across several computing environments, including ND-Condor, IU-Jetstream, and SDSC-Comet, and show that a coarse partitioning results in faster turnaround times, reduced data transfer, and lower master utilization across all three systems.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"15 1","pages":"352-353"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89434752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey of Software Metric Use in Research Software Development 软件度量在研究软件开发中的应用综述
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00036
Nasir U. Eisty, G. Thiruvathukal, Jeffrey C. Carver
Background: Breakthroughs in research increasingly depend on complex software libraries, tools, and applications aimed at supporting specific science, engineering, business, or humanities disciplines. The complexity and criticality of this software motivate the need for ensuring quality and reliability. Software metrics are a key tool for assessing, measuring, and understanding software quality and reliability. Aims: The goal of this work is to better understand how research software developers use traditional software engineering concepts, like metrics, to support and evaluate both the software and the software development process. One key aspect of this goal is to identify how the set of metrics relevant to research software corresponds to the metrics commonly used in traditional software engineering. Method: We surveyed research software developers to gather information about their knowledge and use of code metrics and software process metrics. We also analyzed the influence of demographics (project size, development role, and development stage) on these metrics. Results: The survey results, from 129 respondents, indicate that respondents have a general knowledge of metrics. However, their knowledge of specific SE metrics is lacking, their use even more limited. The most used metrics relate to performance and testing. Even though code complexity often poses a significant challenge to research software development, respondents did not indicate much use of code metrics. Conclusions: Research software developers appear to be interested and see some value in software metrics but may be encountering roadblocks when trying to use them. Further study is needed to determine the extent to which these metrics could provide value in continuous process improvement.
背景:研究的突破越来越依赖于复杂的软件库、工具和旨在支持特定科学、工程、商业或人文学科的应用程序。该软件的复杂性和关键性激发了确保质量和可靠性的需求。软件度量是评估、度量和理解软件质量和可靠性的关键工具。目的:这项工作的目标是更好地理解研究软件开发人员如何使用传统的软件工程概念,如度量,来支持和评估软件和软件开发过程。该目标的一个关键方面是确定与研究软件相关的度量标准集如何与传统软件工程中常用的度量标准相对应。方法:我们调查了研究软件开发人员,以收集有关他们的知识和使用代码度量和软件过程度量的信息。我们还分析了人口统计数据(项目规模、开发角色和开发阶段)对这些指标的影响。结果:来自129名受访者的调查结果表明,受访者对指标有一般的了解。然而,他们缺乏特定SE度量的知识,他们的使用更加有限。最常用的度量标准与性能和测试有关。即使代码复杂性经常对研究软件开发构成重大挑战,被调查者也没有指出代码度量的使用。结论:研究软件开发人员似乎很感兴趣,并且看到了软件度量的一些价值,但在尝试使用它们时可能会遇到障碍。需要进一步的研究来确定这些量度在持续过程改进中提供价值的程度。
{"title":"A Survey of Software Metric Use in Research Software Development","authors":"Nasir U. Eisty, G. Thiruvathukal, Jeffrey C. Carver","doi":"10.1109/eScience.2018.00036","DOIUrl":"https://doi.org/10.1109/eScience.2018.00036","url":null,"abstract":"Background: Breakthroughs in research increasingly depend on complex software libraries, tools, and applications aimed at supporting specific science, engineering, business, or humanities disciplines. The complexity and criticality of this software motivate the need for ensuring quality and reliability. Software metrics are a key tool for assessing, measuring, and understanding software quality and reliability. Aims: The goal of this work is to better understand how research software developers use traditional software engineering concepts, like metrics, to support and evaluate both the software and the software development process. One key aspect of this goal is to identify how the set of metrics relevant to research software corresponds to the metrics commonly used in traditional software engineering. Method: We surveyed research software developers to gather information about their knowledge and use of code metrics and software process metrics. We also analyzed the influence of demographics (project size, development role, and development stage) on these metrics. Results: The survey results, from 129 respondents, indicate that respondents have a general knowledge of metrics. However, their knowledge of specific SE metrics is lacking, their use even more limited. The most used metrics relate to performance and testing. Even though code complexity often poses a significant challenge to research software development, respondents did not indicate much use of code metrics. Conclusions: Research software developers appear to be interested and see some value in software metrics but may be encountering roadblocks when trying to use them. Further study is needed to determine the extent to which these metrics could provide value in continuous process improvement.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"52 1","pages":"212-222"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90063350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2018 IEEE 14th International Conference on e-Science (e-Science)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1