Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00036
Nasir U. Eisty, G. Thiruvathukal, Jeffrey C. Carver
Background: Breakthroughs in research increasingly depend on complex software libraries, tools, and applications aimed at supporting specific science, engineering, business, or humanities disciplines. The complexity and criticality of this software motivate the need for ensuring quality and reliability. Software metrics are a key tool for assessing, measuring, and understanding software quality and reliability. Aims: The goal of this work is to better understand how research software developers use traditional software engineering concepts, like metrics, to support and evaluate both the software and the software development process. One key aspect of this goal is to identify how the set of metrics relevant to research software corresponds to the metrics commonly used in traditional software engineering. Method: We surveyed research software developers to gather information about their knowledge and use of code metrics and software process metrics. We also analyzed the influence of demographics (project size, development role, and development stage) on these metrics. Results: The survey results, from 129 respondents, indicate that respondents have a general knowledge of metrics. However, their knowledge of specific SE metrics is lacking, their use even more limited. The most used metrics relate to performance and testing. Even though code complexity often poses a significant challenge to research software development, respondents did not indicate much use of code metrics. Conclusions: Research software developers appear to be interested and see some value in software metrics but may be encountering roadblocks when trying to use them. Further study is needed to determine the extent to which these metrics could provide value in continuous process improvement.
{"title":"A Survey of Software Metric Use in Research Software Development","authors":"Nasir U. Eisty, G. Thiruvathukal, Jeffrey C. Carver","doi":"10.1109/eScience.2018.00036","DOIUrl":"https://doi.org/10.1109/eScience.2018.00036","url":null,"abstract":"Background: Breakthroughs in research increasingly depend on complex software libraries, tools, and applications aimed at supporting specific science, engineering, business, or humanities disciplines. The complexity and criticality of this software motivate the need for ensuring quality and reliability. Software metrics are a key tool for assessing, measuring, and understanding software quality and reliability. Aims: The goal of this work is to better understand how research software developers use traditional software engineering concepts, like metrics, to support and evaluate both the software and the software development process. One key aspect of this goal is to identify how the set of metrics relevant to research software corresponds to the metrics commonly used in traditional software engineering. Method: We surveyed research software developers to gather information about their knowledge and use of code metrics and software process metrics. We also analyzed the influence of demographics (project size, development role, and development stage) on these metrics. Results: The survey results, from 129 respondents, indicate that respondents have a general knowledge of metrics. However, their knowledge of specific SE metrics is lacking, their use even more limited. The most used metrics relate to performance and testing. Even though code complexity often poses a significant challenge to research software development, respondents did not indicate much use of code metrics. Conclusions: Research software developers appear to be interested and see some value in software metrics but may be encountering roadblocks when trying to use them. Further study is needed to determine the extent to which these metrics could provide value in continuous process improvement.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"52 1","pages":"212-222"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90063350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00097
M. E. Astigarraga
The ATLAS Collaboration
ATLAS合作
{"title":"ATLAS Trigger and Data Acquisition Upgrades for the High Luminosity LHC","authors":"M. E. Astigarraga","doi":"10.1109/eScience.2018.00097","DOIUrl":"https://doi.org/10.1109/eScience.2018.00097","url":null,"abstract":"The ATLAS Collaboration","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"34 1","pages":"358-359"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75612508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00129
M. Kraak, Andreas Weber, J. V. Lottum, Y. Engelhardt
This abstract sketches the basic design of a prototype that enables the proper display, exploration, and analysis of historical shipping data in an adaptable WebVR environment. In the environment users will be able to create visually networked ‘eventscapes’ which allow to identify spatio-temporal patterns in digitized maritime heritage and similar datasets.
{"title":"Toward VR Eventscapes for Spatio-Temporal Access to Digital Maritime Heritage","authors":"M. Kraak, Andreas Weber, J. V. Lottum, Y. Engelhardt","doi":"10.1109/eScience.2018.00129","DOIUrl":"https://doi.org/10.1109/eScience.2018.00129","url":null,"abstract":"This abstract sketches the basic design of a prototype that enables the proper display, exploration, and analysis of historical shipping data in an adaptable WebVR environment. In the environment users will be able to create visually networked ‘eventscapes’ which allow to identify spatio-temporal patterns in digitized maritime heritage and similar datasets.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"50 1","pages":"413-414"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75803717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00045
Etienne Brangbour, P. Bruneau, S. Marchand-Maillet
This abstract states the position of the Publimape project, and unveils progress achieved since its recent start.
这篇摘要陈述了Publimape项目的立场,并揭示了自最近开始以来所取得的进展。
{"title":"Extracting Flood Maps from Social Media for Assimilation","authors":"Etienne Brangbour, P. Bruneau, S. Marchand-Maillet","doi":"10.1109/eScience.2018.00045","DOIUrl":"https://doi.org/10.1109/eScience.2018.00045","url":null,"abstract":"This abstract states the position of the Publimape project, and unveils progress achieved since its recent start.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"13 1","pages":"272-273"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74096373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00115
Brennan Bell, T. Dinter, Vlad Merticariu, B. P. Huu, D. Misev, P. Baumann
Scientists are often interested in sampling buffered regions of data across multiple time-slices in array datacubes. For instance, in studying sea-ice distributions, a string of geographic coordinates with timestamps are requested, representing a sample or ship track line of a measurement campaign. A defined region is sampled around each of those data points using a nearestneighbour approach in time and a buffer or polygon clipping in the spatial domain. Objectively, such queries can be handled discretely across the time domain, as there is no temporal interpolation, and as a result, the tiling of extracted rasters is well-defined by the tiling of the source data. What happens when the resulting object should also be represented by a 3-D raster, such as in the case where the trackline consists of continuous buffered sampling across the timeseries? Spatio-temporal data is typically stored in chunked 3-D arrays, where multiple time-slices appear in the same "tile" or subarray. Unlike the discrete version, tracing out a polygonally-shaped buffer along a ship’s path in a 3-D spatio-temporal datacube leads to shearing across the spatial tiles in the result raster, and this shearing prevents an a priori tiling of the result. Here, we present several approaches to tiling the result raster, and we provide a mathematical investigation of the impact these approaches can have on performance. To substantiate the theoretical investigation, an implementation and performance benchmarks on the different tiling approaches are provided, and the implementation is demonstrated on sea-ice data as a casestudy. In future work, we discuss different approaches towards parallelization utilizing these techniques as a basis for thread-safety, establishing the results on arbitrary R+ trees and extending these results to R* trees.
{"title":"Navigating Sea-Ice Timeseries Data using Tracklines","authors":"Brennan Bell, T. Dinter, Vlad Merticariu, B. P. Huu, D. Misev, P. Baumann","doi":"10.1109/eScience.2018.00115","DOIUrl":"https://doi.org/10.1109/eScience.2018.00115","url":null,"abstract":"Scientists are often interested in sampling buffered regions of data across multiple time-slices in array datacubes. For instance, in studying sea-ice distributions, a string of geographic coordinates with timestamps are requested, representing a sample or ship track line of a measurement campaign. A defined region is sampled around each of those data points using a nearestneighbour approach in time and a buffer or polygon clipping in the spatial domain. Objectively, such queries can be handled discretely across the time domain, as there is no temporal interpolation, and as a result, the tiling of extracted rasters is well-defined by the tiling of the source data. What happens when the resulting object should also be represented by a 3-D raster, such as in the case where the trackline consists of continuous buffered sampling across the timeseries? Spatio-temporal data is typically stored in chunked 3-D arrays, where multiple time-slices appear in the same \"tile\" or subarray. Unlike the discrete version, tracing out a polygonally-shaped buffer along a ship’s path in a 3-D spatio-temporal datacube leads to shearing across the spatial tiles in the result raster, and this shearing prevents an a priori tiling of the result. Here, we present several approaches to tiling the result raster, and we provide a mathematical investigation of the impact these approaches can have on performance. To substantiate the theoretical investigation, an implementation and performance benchmarks on the different tiling approaches are provided, and the implementation is demonstrated on sea-ice data as a casestudy. In future work, we discuss different approaches towards parallelization utilizing these techniques as a basis for thread-safety, establishing the results on arbitrary R+ trees and extending these results to R* trees.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"14 1","pages":"392-392"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74725114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00026
Lasse Wollatz, Mark Scott, Steven J. Johnston, P. Lackie, S. Cox
Microfocus X-ray computed tomography (µCT) and 3D microscopy scanning create scientific data in the form images. These images are each several tens of gigabytes in size. E-Scientists in medicine require a user-friendly way of storing the data and related metadata and accessing it. Existing management systems allow computer scientists to create automatic image workflows through the use of application programming interfaces (APIs) but do not offer an easy alternative for users less familiar with programming. We present a new approach to the management and curation of biomedical image data and related metadata. Our system, Mata, uses a network file share to give users direct access to their data and also provides access to metadata. Mata also enables a variety of visualization options as required by e-Scientists in medicine.
{"title":"Curation of Image Data for Medical Research","authors":"Lasse Wollatz, Mark Scott, Steven J. Johnston, P. Lackie, S. Cox","doi":"10.1109/eScience.2018.00026","DOIUrl":"https://doi.org/10.1109/eScience.2018.00026","url":null,"abstract":"Microfocus X-ray computed tomography (µCT) and 3D microscopy scanning create scientific data in the form images. These images are each several tens of gigabytes in size. E-Scientists in medicine require a user-friendly way of storing the data and related metadata and accessing it. Existing management systems allow computer scientists to create automatic image workflows through the use of application programming interfaces (APIs) but do not offer an easy alternative for users less familiar with programming. We present a new approach to the management and curation of biomedical image data and related metadata. Our system, Mata, uses a network file share to give users direct access to their data and also provides access to metadata. Mata also enables a variety of visualization options as required by e-Scientists in medicine.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"61 1","pages":"105-113"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84567614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00048
D. Bari
Low visibility conditions have a strong impact on air and road traffics and their prediction is still a challenge for meteorologists, particularly its spatial coverage. In this study, an estimated visibility product over the north of Morocco, from the operational NWP model AROME outputs using the state-of-the art of Machine-learning regression, has been developed. The performance of the developed model has been assessed, over the continental part only, based on real data collected at 37 synoptic stations over 2 years. Results analysis points out that the developed model for estimating visibility has shown a strong ability to differentiate between visibilities occurring during daytime and nighttime. However, the KDD-developed model have shown low performance of generality across time. The performance evaluation indicates a bias of -9m, a mean absolute error of 1349m with 0.87 correlation and a root mean-square error of 2150m.
{"title":"Visibility Prediction Based on Kilometric NWP Model Outputs Using Machine-Learning Regression","authors":"D. Bari","doi":"10.1109/eScience.2018.00048","DOIUrl":"https://doi.org/10.1109/eScience.2018.00048","url":null,"abstract":"Low visibility conditions have a strong impact on air and road traffics and their prediction is still a challenge for meteorologists, particularly its spatial coverage. In this study, an estimated visibility product over the north of Morocco, from the operational NWP model AROME outputs using the state-of-the art of Machine-learning regression, has been developed. The performance of the developed model has been assessed, over the continental part only, based on real data collected at 37 synoptic stations over 2 years. Results analysis points out that the developed model for estimating visibility has shown a strong ability to differentiate between visibilities occurring during daytime and nighttime. However, the KDD-developed model have shown low performance of generality across time. The performance evaluation indicates a bias of -9m, a mean absolute error of 1349m with 0.87 correlation and a root mean-square error of 2150m.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"52 1","pages":"278-278"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83387136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00112
S. Yatawatta, H. Spreeuw, F. Diblen
We have modified the LBFGS optimizer in PyTorch based on our knowledge in using the LBFGS algorithm in radio interferometric calibration (SAGECal). We give results to show the performance improvement of PyTorch in various machine learning applications due to our improvements.
{"title":"Improving LBFGS Optimizer in PyTorch: Knowledge Transfer from Radio Interferometric Calibration to Machine Learning","authors":"S. Yatawatta, H. Spreeuw, F. Diblen","doi":"10.1109/eScience.2018.00112","DOIUrl":"https://doi.org/10.1109/eScience.2018.00112","url":null,"abstract":"We have modified the LBFGS optimizer in PyTorch based on our knowledge in using the LBFGS algorithm in radio interferometric calibration (SAGECal). We give results to show the performance improvement of PyTorch in various machine learning applications due to our improvements.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"91 1","pages":"386-387"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72826870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00056
Lauren Roberts, Peter Michalák, S. Heaps, M. Trenell, D. Wilkinson, P. Watson
There has been a dramatic growth in the number and range of Internet of Things (IoT) sensors that generate healthcare data. These sensors stream high-dimensional time series data that must be analysed in order to provide the insights into medical conditions that can improve patient healthcare. This raises both statistical and computational challenges, including where to deploy the streaming data analytics, given that a typical healthcare IoT system will combine a highly diverse set of components with very varied computational characteristics, e.g. sensors, mobile phones and clouds. Different partitionings of the analytics across these components can dramatically affect key factors such as the battery life of the sensors, and the overall performance. In this work we describe a method for automatically partitioning stream processing across a set of components in order to optimise for a range of factors including sensor battery life and communications bandwidth. We illustrate this using our implementation of a statistical model predicting the glucose levels of type II diabetes patients in order to reduce the risk of hyperglycaemia.
{"title":"Automating the Placement of Time Series Models for IoT Healthcare Applications","authors":"Lauren Roberts, Peter Michalák, S. Heaps, M. Trenell, D. Wilkinson, P. Watson","doi":"10.1109/eScience.2018.00056","DOIUrl":"https://doi.org/10.1109/eScience.2018.00056","url":null,"abstract":"There has been a dramatic growth in the number and range of Internet of Things (IoT) sensors that generate healthcare data. These sensors stream high-dimensional time series data that must be analysed in order to provide the insights into medical conditions that can improve patient healthcare. This raises both statistical and computational challenges, including where to deploy the streaming data analytics, given that a typical healthcare IoT system will combine a highly diverse set of components with very varied computational characteristics, e.g. sensors, mobile phones and clouds. Different partitionings of the analytics across these components can dramatically affect key factors such as the battery life of the sensors, and the overall performance. In this work we describe a method for automatically partitioning stream processing across a set of components in order to optimise for a range of factors including sensor battery life and communications bandwidth. We illustrate this using our implementation of a statistical model predicting the glucose levels of type II diabetes patients in order to reduce the risk of hyperglycaemia.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"145 1","pages":"290-291"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90712009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00047
S. E. Haupt, J. Cowie, Seth Linden, Tyler C. McCandless, B. Kosović, S. Alessandrini
The National Center for Atmospheric Research (NCAR) has a long history of applying machine learning to weather forecasting challenges. The Dynamic Integrated foreCasting (DICast®) System was one of the first automated weather forecasting engines. It is now in use in quite a few companies with many applications. Some applications being accomplished at NCAR that include DICast and other artificial intelligence technologies include renewable energy, surface transportation, and wildland fire forecasting.
{"title":"Machine Learning for Applied Weather Prediction","authors":"S. E. Haupt, J. Cowie, Seth Linden, Tyler C. McCandless, B. Kosović, S. Alessandrini","doi":"10.1109/eScience.2018.00047","DOIUrl":"https://doi.org/10.1109/eScience.2018.00047","url":null,"abstract":"The National Center for Atmospheric Research (NCAR) has a long history of applying machine learning to weather forecasting challenges. The Dynamic Integrated foreCasting (DICast®) System was one of the first automated weather forecasting engines. It is now in use in quite a few companies with many applications. Some applications being accomplished at NCAR that include DICast and other artificial intelligence technologies include renewable energy, surface transportation, and wildland fire forecasting.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"37 1","pages":"276-277"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88817239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}