2018 IEEE 14th International Conference on e-Science (e-Science)最新文献

英文中文

A Survey of Software Metric Use in Research Software Development 软件度量在研究软件开发中的应用综述

2018 IEEE 14th International Conference on e-Science (e-Science)

Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00036

Nasir U. Eisty, G. Thiruvathukal, Jeffrey C. Carver

Background: Breakthroughs in research increasingly depend on complex software libraries, tools, and applications aimed at supporting specific science, engineering, business, or humanities disciplines. The complexity and criticality of this software motivate the need for ensuring quality and reliability. Software metrics are a key tool for assessing, measuring, and understanding software quality and reliability. Aims: The goal of this work is to better understand how research software developers use traditional software engineering concepts, like metrics, to support and evaluate both the software and the software development process. One key aspect of this goal is to identify how the set of metrics relevant to research software corresponds to the metrics commonly used in traditional software engineering. Method: We surveyed research software developers to gather information about their knowledge and use of code metrics and software process metrics. We also analyzed the influence of demographics (project size, development role, and development stage) on these metrics. Results: The survey results, from 129 respondents, indicate that respondents have a general knowledge of metrics. However, their knowledge of specific SE metrics is lacking, their use even more limited. The most used metrics relate to performance and testing. Even though code complexity often poses a significant challenge to research software development, respondents did not indicate much use of code metrics. Conclusions: Research software developers appear to be interested and see some value in software metrics but may be encountering roadblocks when trying to use them. Further study is needed to determine the extent to which these metrics could provide value in continuous process improvement.

背景:研究的突破越来越依赖于复杂的软件库、工具和旨在支持特定科学、工程、商业或人文学科的应用程序。该软件的复杂性和关键性激发了确保质量和可靠性的需求。软件度量是评估、度量和理解软件质量和可靠性的关键工具。目的:这项工作的目标是更好地理解研究软件开发人员如何使用传统的软件工程概念，如度量，来支持和评估软件和软件开发过程。该目标的一个关键方面是确定与研究软件相关的度量标准集如何与传统软件工程中常用的度量标准相对应。方法:我们调查了研究软件开发人员，以收集有关他们的知识和使用代码度量和软件过程度量的信息。我们还分析了人口统计数据(项目规模、开发角色和开发阶段)对这些指标的影响。结果:来自129名受访者的调查结果表明，受访者对指标有一般的了解。然而，他们缺乏特定SE度量的知识，他们的使用更加有限。最常用的度量标准与性能和测试有关。即使代码复杂性经常对研究软件开发构成重大挑战，被调查者也没有指出代码度量的使用。结论:研究软件开发人员似乎很感兴趣，并且看到了软件度量的一些价值，但在尝试使用它们时可能会遇到障碍。需要进一步的研究来确定这些量度在持续过程改进中提供价值的程度。

{"title":"A Survey of Software Metric Use in Research Software Development","authors":"Nasir U. Eisty, G. Thiruvathukal, Jeffrey C. Carver","doi":"10.1109/eScience.2018.00036","DOIUrl":"https://doi.org/10.1109/eScience.2018.00036","url":null,"abstract":"Background: Breakthroughs in research increasingly depend on complex software libraries, tools, and applications aimed at supporting specific science, engineering, business, or humanities disciplines. The complexity and criticality of this software motivate the need for ensuring quality and reliability. Software metrics are a key tool for assessing, measuring, and understanding software quality and reliability. Aims: The goal of this work is to better understand how research software developers use traditional software engineering concepts, like metrics, to support and evaluate both the software and the software development process. One key aspect of this goal is to identify how the set of metrics relevant to research software corresponds to the metrics commonly used in traditional software engineering. Method: We surveyed research software developers to gather information about their knowledge and use of code metrics and software process metrics. We also analyzed the influence of demographics (project size, development role, and development stage) on these metrics. Results: The survey results, from 129 respondents, indicate that respondents have a general knowledge of metrics. However, their knowledge of specific SE metrics is lacking, their use even more limited. The most used metrics relate to performance and testing. Even though code complexity often poses a significant challenge to research software development, respondents did not indicate much use of code metrics. Conclusions: Research software developers appear to be interested and see some value in software metrics but may be encountering roadblocks when trying to use them. Further study is needed to determine the extent to which these metrics could provide value in continuous process improvement.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"52 1","pages":"212-222"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90063350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

ATLAS Trigger and Data Acquisition Upgrades for the High Luminosity LHC 高亮度LHC的ATLAS触发和数据采集升级

2018 IEEE 14th International Conference on e-Science (e-Science)

Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00097

M. E. Astigarraga

The ATLAS Collaboration

ATLAS合作

引用次数: 3

Toward VR Eventscapes for Spatio-Temporal Access to Digital Maritime Heritage 面向数字海洋遗产时空访问的VR事件场景

2018 IEEE 14th International Conference on e-Science (e-Science)

Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00129

M. Kraak, Andreas Weber, J. V. Lottum, Y. Engelhardt

This abstract sketches the basic design of a prototype that enables the proper display, exploration, and analysis of historical shipping data in an adaptable WebVR environment. In the environment users will be able to create visually networked ‘eventscapes’ which allow to identify spatio-temporal patterns in digitized maritime heritage and similar datasets.

这个抽象草图的原型的基本设计，使适当的显示，探索和分析历史航运数据在一个可适应的WebVR环境。在该环境中，用户将能够创建视觉网络化的“事件场景”，从而识别数字化海洋遗产和类似数据集中的时空模式。

引用次数: 0

Extracting Flood Maps from Social Media for Assimilation 从社交媒体中提取洪水地图进行同化

2018 IEEE 14th International Conference on e-Science (e-Science)

Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00045

Etienne Brangbour, P. Bruneau, S. Marchand-Maillet

This abstract states the position of the Publimape project, and unveils progress achieved since its recent start.

这篇摘要陈述了Publimape项目的立场，并揭示了自最近开始以来所取得的进展。

引用次数: 5

Navigating Sea-Ice Timeseries Data using Tracklines 使用轨道线导航海冰时间序列数据

2018 IEEE 14th International Conference on e-Science (e-Science)

Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00115

Brennan Bell, T. Dinter, Vlad Merticariu, B. P. Huu, D. Misev, P. Baumann

Scientists are often interested in sampling buffered regions of data across multiple time-slices in array datacubes. For instance, in studying sea-ice distributions, a string of geographic coordinates with timestamps are requested, representing a sample or ship track line of a measurement campaign. A defined region is sampled around each of those data points using a nearestneighbour approach in time and a buffer or polygon clipping in the spatial domain. Objectively, such queries can be handled discretely across the time domain, as there is no temporal interpolation, and as a result, the tiling of extracted rasters is well-defined by the tiling of the source data. What happens when the resulting object should also be represented by a 3-D raster, such as in the case where the trackline consists of continuous buffered sampling across the timeseries? Spatio-temporal data is typically stored in chunked 3-D arrays, where multiple time-slices appear in the same "tile" or subarray. Unlike the discrete version, tracing out a polygonally-shaped buffer along a ship’s path in a 3-D spatio-temporal datacube leads to shearing across the spatial tiles in the result raster, and this shearing prevents an a priori tiling of the result. Here, we present several approaches to tiling the result raster, and we provide a mathematical investigation of the impact these approaches can have on performance. To substantiate the theoretical investigation, an implementation and performance benchmarks on the different tiling approaches are provided, and the implementation is demonstrated on sea-ice data as a casestudy. In future work, we discuss different approaches towards parallelization utilizing these techniques as a basis for thread-safety, establishing the results on arbitrary R+ trees and extending these results to R* trees.

科学家们经常对在数组数据中跨多个时间片采样数据的缓冲区域感兴趣。例如，在研究海冰分布时，需要一串带有时间戳的地理坐标，代表测量活动的样本或船舶轨迹线。在每个数据点周围使用时间上的最近邻方法和空间域中的缓冲或多边形裁剪来采样一个定义的区域。客观地说，这样的查询可以跨时间域离散地处理，因为没有时间插值，因此，提取的光栅的平铺是通过源数据的平铺来定义的。当结果对象也应该由3-D光栅表示时，例如在轨道线由跨时间序列的连续缓冲采样组成的情况下，会发生什么情况?时空数据通常存储在块三维数组中，其中多个时间片出现在相同的“块”或子数组中。与离散版本不同，在三维时空数据立方体中沿着船舶路径绘制多边形缓冲区会导致结果栅格中的空间瓦片被剪切，这种剪切会防止结果的先验瓦片。在这里，我们提出了几种将结果光栅平铺的方法，并对这些方法对性能的影响进行了数学研究。为了证实理论研究，提供了不同平铺方法的实施和性能基准，并在海冰数据上进行了案例研究。在未来的工作中，我们将讨论利用这些技术作为线程安全基础的不同并行化方法，在任意R+树上建立结果并将这些结果扩展到R*树。

{"title":"Navigating Sea-Ice Timeseries Data using Tracklines","authors":"Brennan Bell, T. Dinter, Vlad Merticariu, B. P. Huu, D. Misev, P. Baumann","doi":"10.1109/eScience.2018.00115","DOIUrl":"https://doi.org/10.1109/eScience.2018.00115","url":null,"abstract":"Scientists are often interested in sampling buffered regions of data across multiple time-slices in array datacubes. For instance, in studying sea-ice distributions, a string of geographic coordinates with timestamps are requested, representing a sample or ship track line of a measurement campaign. A defined region is sampled around each of those data points using a nearestneighbour approach in time and a buffer or polygon clipping in the spatial domain. Objectively, such queries can be handled discretely across the time domain, as there is no temporal interpolation, and as a result, the tiling of extracted rasters is well-defined by the tiling of the source data. What happens when the resulting object should also be represented by a 3-D raster, such as in the case where the trackline consists of continuous buffered sampling across the timeseries? Spatio-temporal data is typically stored in chunked 3-D arrays, where multiple time-slices appear in the same \"tile\" or subarray. Unlike the discrete version, tracing out a polygonally-shaped buffer along a ship’s path in a 3-D spatio-temporal datacube leads to shearing across the spatial tiles in the result raster, and this shearing prevents an a priori tiling of the result. Here, we present several approaches to tiling the result raster, and we provide a mathematical investigation of the impact these approaches can have on performance. To substantiate the theoretical investigation, an implementation and performance benchmarks on the different tiling approaches are provided, and the implementation is demonstrated on sea-ice data as a casestudy. In future work, we discuss different approaches towards parallelization utilizing these techniques as a basis for thread-safety, establishing the results on arbitrary R+ trees and extending these results to R* trees.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"14 1","pages":"392-392"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74725114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Curation of Image Data for Medical Research 医学研究图像数据管理

2018 IEEE 14th International Conference on e-Science (e-Science)

Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00026

Lasse Wollatz, Mark Scott, Steven J. Johnston, P. Lackie, S. Cox

Microfocus X-ray computed tomography (µCT) and 3D microscopy scanning create scientific data in the form images. These images are each several tens of gigabytes in size. E-Scientists in medicine require a user-friendly way of storing the data and related metadata and accessing it. Existing management systems allow computer scientists to create automatic image workflows through the use of application programming interfaces (APIs) but do not offer an easy alternative for users less familiar with programming. We present a new approach to the management and curation of biomedical image data and related metadata. Our system, Mata, uses a network file share to give users direct access to their data and also provides access to metadata. Mata also enables a variety of visualization options as required by e-Scientists in medicine.

微聚焦x射线计算机断层扫描(µCT)和3D显微镜扫描以图像的形式创建科学数据。这些图像的大小都是几十gb。医学领域的电子科学家需要一种用户友好的方式来存储和访问数据和相关元数据。现有的管理系统允许计算机科学家通过使用应用程序编程接口(api)创建自动图像工作流，但没有为不熟悉编程的用户提供一个简单的替代方案。我们提出了一种管理和管理生物医学图像数据和相关元数据的新方法。我们的系统Mata使用网络文件共享让用户直接访问他们的数据，还提供对元数据的访问。Mata还支持医学电子科学家所需的各种可视化选项。

引用次数: 4

Visibility Prediction Based on Kilometric NWP Model Outputs Using Machine-Learning Regression 基于千米NWP模型输出的机器学习回归能见度预测

2018 IEEE 14th International Conference on e-Science (e-Science)

Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00048

D. Bari

Low visibility conditions have a strong impact on air and road traffics and their prediction is still a challenge for meteorologists, particularly its spatial coverage. In this study, an estimated visibility product over the north of Morocco, from the operational NWP model AROME outputs using the state-of-the art of Machine-learning regression, has been developed. The performance of the developed model has been assessed, over the continental part only, based on real data collected at 37 synoptic stations over 2 years. Results analysis points out that the developed model for estimating visibility has shown a strong ability to differentiate between visibilities occurring during daytime and nighttime. However, the KDD-developed model have shown low performance of generality across time. The performance evaluation indicates a bias of -9m, a mean absolute error of 1349m with 0.87 correlation and a root mean-square error of 2150m.

低能见度条件对空中和道路交通有很大影响，其预测对气象学家来说仍然是一个挑战，特别是其空间覆盖范围。在本研究中，利用最先进的机器学习回归技术，从运行的NWP模型AROME输出中开发了估计摩洛哥北部的能见度产品。已开发的模式的性能仅在大陆部分进行了评估，其依据是2年来在37个天气站收集的实际数据。结果分析表明，所建立的能见度估算模型具有较强的区分白天和夜间能见度的能力。然而，随着时间的推移，kdd开发的模型显示出较低的通用性。性能评价偏差为-9m，平均绝对误差为1349m，相关系数为0.87，均方根误差为2150m。

引用次数: 14

Improving LBFGS Optimizer in PyTorch: Knowledge Transfer from Radio Interferometric Calibration to Machine Learning 在PyTorch中改进LBFGS优化器:从无线电干涉校准到机器学习的知识转移

2018 IEEE 14th International Conference on e-Science (e-Science)

Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00112

S. Yatawatta, H. Spreeuw, F. Diblen

We have modified the LBFGS optimizer in PyTorch based on our knowledge in using the LBFGS algorithm in radio interferometric calibration (SAGECal). We give results to show the performance improvement of PyTorch in various machine learning applications due to our improvements.

基于我们在无线电干涉校准(SAGECal)中使用LBFGS算法的知识，我们修改了PyTorch中的LBFGS优化器。我们给出的结果表明，由于我们的改进，PyTorch在各种机器学习应用程序中的性能有所提高。

引用次数: 4

Automating the Placement of Time Series Models for IoT Healthcare Applications 自动化放置物联网医疗保健应用的时间序列模型

2018 IEEE 14th International Conference on e-Science (e-Science)

Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00056

Lauren Roberts, Peter Michalák, S. Heaps, M. Trenell, D. Wilkinson, P. Watson

There has been a dramatic growth in the number and range of Internet of Things (IoT) sensors that generate healthcare data. These sensors stream high-dimensional time series data that must be analysed in order to provide the insights into medical conditions that can improve patient healthcare. This raises both statistical and computational challenges, including where to deploy the streaming data analytics, given that a typical healthcare IoT system will combine a highly diverse set of components with very varied computational characteristics, e.g. sensors, mobile phones and clouds. Different partitionings of the analytics across these components can dramatically affect key factors such as the battery life of the sensors, and the overall performance. In this work we describe a method for automatically partitioning stream processing across a set of components in order to optimise for a range of factors including sensor battery life and communications bandwidth. We illustrate this using our implementation of a statistical model predicting the glucose levels of type II diabetes patients in order to reduce the risk of hyperglycaemia.

产生医疗保健数据的物联网(IoT)传感器的数量和范围急剧增长。这些传感器传输高维时间序列数据，必须对这些数据进行分析，以提供对医疗状况的洞察，从而改善患者的医疗保健。这带来了统计和计算方面的挑战，包括在哪里部署流数据分析，因为典型的医疗保健物联网系统将结合高度多样化的组件集，具有非常不同的计算特征，例如传感器、移动电话和云。跨这些组件的不同分析分区可能会极大地影响传感器的电池寿命和整体性能等关键因素。在这项工作中，我们描述了一种跨一组组件自动划分流处理的方法，以优化一系列因素，包括传感器电池寿命和通信带宽。我们使用统计模型来预测II型糖尿病患者的血糖水平，以降低高血糖的风险。

{"title":"Automating the Placement of Time Series Models for IoT Healthcare Applications","authors":"Lauren Roberts, Peter Michalák, S. Heaps, M. Trenell, D. Wilkinson, P. Watson","doi":"10.1109/eScience.2018.00056","DOIUrl":"https://doi.org/10.1109/eScience.2018.00056","url":null,"abstract":"There has been a dramatic growth in the number and range of Internet of Things (IoT) sensors that generate healthcare data. These sensors stream high-dimensional time series data that must be analysed in order to provide the insights into medical conditions that can improve patient healthcare. This raises both statistical and computational challenges, including where to deploy the streaming data analytics, given that a typical healthcare IoT system will combine a highly diverse set of components with very varied computational characteristics, e.g. sensors, mobile phones and clouds. Different partitionings of the analytics across these components can dramatically affect key factors such as the battery life of the sensors, and the overall performance. In this work we describe a method for automatically partitioning stream processing across a set of components in order to optimise for a range of factors including sensor battery life and communications bandwidth. We illustrate this using our implementation of a statistical model predicting the glucose levels of type II diabetes patients in order to reduce the risk of hyperglycaemia.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"145 1","pages":"290-291"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90712009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Machine Learning for Applied Weather Prediction 应用于天气预报的机器学习

2018 IEEE 14th International Conference on e-Science (e-Science)

Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00047

S. E. Haupt, J. Cowie, Seth Linden, Tyler C. McCandless, B. Kosović, S. Alessandrini

The National Center for Atmospheric Research (NCAR) has a long history of applying machine learning to weather forecasting challenges. The Dynamic Integrated foreCasting (DICast®) System was one of the first automated weather forecasting engines. It is now in use in quite a few companies with many applications. Some applications being accomplished at NCAR that include DICast and other artificial intelligence technologies include renewable energy, surface transportation, and wildland fire forecasting.

美国国家大气研究中心(NCAR)在将机器学习应用于天气预报挑战方面有着悠久的历史。动态综合预报(DICast®)系统是最早的自动天气预报引擎之一。它现在在相当多的公司使用，有许多应用程序。NCAR正在完成的一些应用包括DICast和其他人工智能技术，包括可再生能源、地面运输和野火预测。

引用次数: 28

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 IEEE 14th International Conference on e-Science (e-Science)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀