Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00095
R. Gentz, H. Martín, Edward Baidoo, S. Peisert
We describe the fully automated workflow path developed for the ingest and analysis of liquid chromatography mass spectrometry (LCMS) data. With the help of this computational workflow, we were able to replace two human work days to analyze data with two hours of unsupervised computation time. In addition, this tool also can compute confidence intervals for all its results, based on the noise level present in the data. We leverage only open source tools and libraries in this workflow.
{"title":"Workflow Automation in Liquid Chromatography Mass Spectrometry","authors":"R. Gentz, H. Martín, Edward Baidoo, S. Peisert","doi":"10.1109/eScience.2019.00095","DOIUrl":"https://doi.org/10.1109/eScience.2019.00095","url":null,"abstract":"We describe the fully automated workflow path developed for the ingest and analysis of liquid chromatography mass spectrometry (LCMS) data. With the help of this computational workflow, we were able to replace two human work days to analyze data with two hours of unsupervised computation time. In addition, this tool also can compute confidence intervals for all its results, based on the noise level present in the data. We leverage only open source tools and libraries in this workflow.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125993935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00075
Y. Demchenko, Zhiming Zhao, Jayachander Surbiryala, Spiros Koulouzis, Zeshun Shi, X. Liao, Jelena Gordiyenko
This paper presents recommendations on the design and pilot implementation of the DevOps and Cloud based Software Development curricula for Computer Science and Software Engineering masters. The central part of proposed approach is the Body of Knowledge in the DevOps technologies for Software Engineering (DevOpsSE BoK) that defines a set Knowledge Areas and Knowledge Units required for SE professionals to work efficiently as DevOps engineer or application developer. Defining DevOpsSE-BoK provides a basis for defining required professional competences and skills and allows consistent curricula structuring and profiling. The paper also reports on the experience of the first course run on 2018/2019 academic year at the University of Amsterdam. The paper presents the structure of the course and explains what instructional methodologies have been used for course development, such as project based learning that facilitates the students' team based skills both in mastering Agile development process and skills sharing. The paper provides a short summary of the generally used DevOps definitions, concepts, models and tools, specifically focusing on the cloud based DevOps tools for software development, deployment and operation that allows the main DevOps principle of continuous development and continuous improvement which are critical for modern agile data driven companies.
{"title":"Teaching DevOps and Cloud Based Software Engineering in University Curricula","authors":"Y. Demchenko, Zhiming Zhao, Jayachander Surbiryala, Spiros Koulouzis, Zeshun Shi, X. Liao, Jelena Gordiyenko","doi":"10.1109/eScience.2019.00075","DOIUrl":"https://doi.org/10.1109/eScience.2019.00075","url":null,"abstract":"This paper presents recommendations on the design and pilot implementation of the DevOps and Cloud based Software Development curricula for Computer Science and Software Engineering masters. The central part of proposed approach is the Body of Knowledge in the DevOps technologies for Software Engineering (DevOpsSE BoK) that defines a set Knowledge Areas and Knowledge Units required for SE professionals to work efficiently as DevOps engineer or application developer. Defining DevOpsSE-BoK provides a basis for defining required professional competences and skills and allows consistent curricula structuring and profiling. The paper also reports on the experience of the first course run on 2018/2019 academic year at the University of Amsterdam. The paper presents the structure of the course and explains what instructional methodologies have been used for course development, such as project based learning that facilitates the students' team based skills both in mastering Agile development process and skills sharing. The paper provides a short summary of the generally used DevOps definitions, concepts, models and tools, specifically focusing on the cloud based DevOps tools for software development, deployment and operation that allows the main DevOps principle of continuous development and continuous improvement which are critical for modern agile data driven companies.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124418813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00048
Ravi Shankar, N. Ilangakoon, A. Orenstein, Floriana Ciaglia, N. Glenn, C. Olschanowsky
AdaptLidarTools is a software package that processes full-waveform lidar data. Full-waveform lidar is an active remote sensing technique in which a laser beam is emitted towards a target and the backscattered energy is recorded as a near continuous waveform. A collection of waveforms from airborne lidar can capture landscape characteristics in three dimensions. Specific to vegetation, the extracted echoes and echo properties from the waveforms can provide scientists structural (height, volume, layers of canopy, among others) and functional (leaf area index, diversity) characteristics. The discrete waveforms can be transformed into georeferenced 2D rasters (images), allowing scientists to correlate field-based observations for validation of the waveform observations and fusing the data with other geospatial information. AdaptLidarTools provides an extensible, open-source framework that processes the waveforms and produces multiple data outputs that can be used in vegetation and terrain analysis. AdaptLidarTools is designed to explore new methods to fit full-waveform lidar signals and to maximize the information in the waveforms for vegetation applications. The toolkit explores first differencing, complementary to Gaussian fitting, for faster processing of full-waveform lidar signals and for handling increasingly large volumes of full-waveform lidar datasets. AdaptLidarTools takes approximately 30 min to derive a raster of a given echo property from a raw waveform file of 1 GB size. The toolkit generates first order echo properties such as position, amplitude, pulse width, and other properties such as rise time, fall time and backscattered cross section. It also generates other properties that current proprietary and open-source tools do not. The derived echo properties are delivered as georeferenced raster files of a given spatial resolution that can be viewed and processed by most remote sensing data processing software.
{"title":"AdaptLidarTools: A Full-Waveform Lidar Processing Suite","authors":"Ravi Shankar, N. Ilangakoon, A. Orenstein, Floriana Ciaglia, N. Glenn, C. Olschanowsky","doi":"10.1109/eScience.2019.00048","DOIUrl":"https://doi.org/10.1109/eScience.2019.00048","url":null,"abstract":"AdaptLidarTools is a software package that processes full-waveform lidar data. Full-waveform lidar is an active remote sensing technique in which a laser beam is emitted towards a target and the backscattered energy is recorded as a near continuous waveform. A collection of waveforms from airborne lidar can capture landscape characteristics in three dimensions. Specific to vegetation, the extracted echoes and echo properties from the waveforms can provide scientists structural (height, volume, layers of canopy, among others) and functional (leaf area index, diversity) characteristics. The discrete waveforms can be transformed into georeferenced 2D rasters (images), allowing scientists to correlate field-based observations for validation of the waveform observations and fusing the data with other geospatial information. AdaptLidarTools provides an extensible, open-source framework that processes the waveforms and produces multiple data outputs that can be used in vegetation and terrain analysis. AdaptLidarTools is designed to explore new methods to fit full-waveform lidar signals and to maximize the information in the waveforms for vegetation applications. The toolkit explores first differencing, complementary to Gaussian fitting, for faster processing of full-waveform lidar signals and for handling increasingly large volumes of full-waveform lidar datasets. AdaptLidarTools takes approximately 30 min to derive a raster of a given echo property from a raw waveform file of 1 GB size. The toolkit generates first order echo properties such as position, amplitude, pulse width, and other properties such as rise time, fall time and backscattered cross section. It also generates other properties that current proprietary and open-source tools do not. The derived echo properties are delivered as georeferenced raster files of a given spatial resolution that can be viewed and processed by most remote sensing data processing software.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116368680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00065
P. Neumann, J. Biercamp
Kilometer-scale ensemble simulations are expected to significantly boost and impact weather and climate predictions in the future. However, these simulations will only be enabled by exascale compute power and corresponding data capacity. In the following, we discuss a European effort in terms of the e-infrastructure Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE). ESiWACE provides infrastructural means to prepare the weather and climate communities for simulations at the exascale. We give an overview of several ESiWACE infrastructure components and discuss their role in reaching the goal of kilometer-scale ensemble predictions. We particularly review the outcomes of the ESiWACE demonstrators, that is community-driven kilometer-scale models that have been developed throughout the last years.
{"title":"ESiWACE: On European Infrastructure Efforts for Weather and Climate Modeling at Exascale","authors":"P. Neumann, J. Biercamp","doi":"10.1109/eScience.2019.00065","DOIUrl":"https://doi.org/10.1109/eScience.2019.00065","url":null,"abstract":"Kilometer-scale ensemble simulations are expected to significantly boost and impact weather and climate predictions in the future. However, these simulations will only be enabled by exascale compute power and corresponding data capacity. In the following, we discuss a European effort in terms of the e-infrastructure Centre of Excellence in Simulation of Weather and Climate in Europe (ESiWACE). ESiWACE provides infrastructural means to prepare the weather and climate communities for simulations at the exascale. We give an overview of several ESiWACE infrastructure components and discuss their role in reaching the goal of kilometer-scale ensemble predictions. We particularly review the outcomes of the ESiWACE demonstrators, that is community-driven kilometer-scale models that have been developed throughout the last years.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"524 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115936324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00053
E. Laure, Olivia Eriksson, Erik Lindahl, D. Henningson
Since 2010, the Swedish e-Science Research Centre (SeRC) is funding and coordinating e-Science activities in a broad spectrum of scientific disciplines. After an initial 5-year phase that produced outstanding results, SeRC is increasingly focusing on fostering interactions between disciplines and has created so-called Multidisciplinary Collaborative Programs (MCPs). In these programs, domain researchers collaborate with e-Science methods and tool developers and e-Infrastructure providers. In this paper we give an overview of the initial phase of SeRC and present the new programs that started operating in 2019.
{"title":"The Future of Swedish e-Science: SeRC 2.0","authors":"E. Laure, Olivia Eriksson, Erik Lindahl, D. Henningson","doi":"10.1109/eScience.2019.00053","DOIUrl":"https://doi.org/10.1109/eScience.2019.00053","url":null,"abstract":"Since 2010, the Swedish e-Science Research Centre (SeRC) is funding and coordinating e-Science activities in a broad spectrum of scientific disciplines. After an initial 5-year phase that produced outstanding results, SeRC is increasingly focusing on fostering interactions between disciplines and has created so-called Multidisciplinary Collaborative Programs (MCPs). In these programs, domain researchers collaborate with e-Science methods and tool developers and e-Infrastructure providers. In this paper we give an overview of the initial phase of SeRC and present the new programs that started operating in 2019.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116934250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00064
P. Gschwandtner, Herbert Jordan, Peter Thoman, T. Fahringer
Effectively implementing scientific algorithms in distributed memory parallel applications is a difficult task for domain scientists, as evident by the large number of domain-specific languages and libraries available today attempting to facilitate the process. However, they usually provide a closed set of parallel patterns and are not open for extension without vast modifications to the underlying system. In this work, we present the AllScale API, a programming interface for developing distributed memory parallel applications with the ease of shared memory programming models. The AllScale API is closed for modification but open for extension, allowing new, user-defined parallel patterns and data structures to be implemented based on existing core primitives and therefore fully supported in the AllScale framework. Focusing on high-level functionality directly offered to application developers, we present the design advantages of such an API design, detail some of its specifications and evaluate it using three real-world use cases. Our results show that AllScale decreases the complexity of implementing scientific applications for distributed memory while attaining comparable or higher performance compared to MPI reference implementations.
{"title":"The AllScale API","authors":"P. Gschwandtner, Herbert Jordan, Peter Thoman, T. Fahringer","doi":"10.1109/eScience.2019.00064","DOIUrl":"https://doi.org/10.1109/eScience.2019.00064","url":null,"abstract":"Effectively implementing scientific algorithms in distributed memory parallel applications is a difficult task for domain scientists, as evident by the large number of domain-specific languages and libraries available today attempting to facilitate the process. However, they usually provide a closed set of parallel patterns and are not open for extension without vast modifications to the underlying system. In this work, we present the AllScale API, a programming interface for developing distributed memory parallel applications with the ease of shared memory programming models. The AllScale API is closed for modification but open for extension, allowing new, user-defined parallel patterns and data structures to be implemented based on existing core primitives and therefore fully supported in the AllScale framework. Focusing on high-level functionality directly offered to application developers, we present the design advantages of such an API design, detail some of its specifications and evaluate it using three real-world use cases. Our results show that AllScale decreases the complexity of implementing scientific applications for distributed memory while attaining comparable or higher performance compared to MPI reference implementations.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127382113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00070
Denny Vrandečić
We propose to use Wikidata to provide metadata for datasets when the traditional approach via Schema.org is not feasible. We describe and discuss the proposal, and believe that the process described in this paper can help with increasing findability and accessibility of certain datasets.
{"title":"Describing Datasets in Wikidata","authors":"Denny Vrandečić","doi":"10.1109/eScience.2019.00070","DOIUrl":"https://doi.org/10.1109/eScience.2019.00070","url":null,"abstract":"We propose to use Wikidata to provide metadata for datasets when the traditional approach via Schema.org is not feasible. We describe and discuss the proposal, and believe that the process described in this paper can help with increasing findability and accessibility of certain datasets.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121400308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00043
R. Muth, Kerstin Eisenhut, J. Rabe, Florian Tschorsch
Urban development processes often suffer from mistrust amongst different stakeholder groups. The lack of transparency within complex and long-term planning processes and the limited scope for co-creation and joint decision-making constitute a persistent problem for successful participation in urban planning. Civic technology has the potential to improve this predicament. With BBBlockchain, we propose a blockchain-based participation platform, which is able to address all layers of participation. In the development of the platform, we focus on two key aspects: How to increase transparency and how to introduce enhanced co-decision-making. To this end, we exploit the immutable nature of blockchains and effectively offer a platform that excludes monopolistic control over information. The decision-making process is governed by smart contracts implementing, for example, timestamping of planning documents, opinion polls, and the management of a participatory budget. Our architecture and prototypes show the operational capabilities of this approach in a series of use cases for urban development.
{"title":"BBBlockchain: Blockchain-Based Participation in Urban Development","authors":"R. Muth, Kerstin Eisenhut, J. Rabe, Florian Tschorsch","doi":"10.1109/eScience.2019.00043","DOIUrl":"https://doi.org/10.1109/eScience.2019.00043","url":null,"abstract":"Urban development processes often suffer from mistrust amongst different stakeholder groups. The lack of transparency within complex and long-term planning processes and the limited scope for co-creation and joint decision-making constitute a persistent problem for successful participation in urban planning. Civic technology has the potential to improve this predicament. With BBBlockchain, we propose a blockchain-based participation platform, which is able to address all layers of participation. In the development of the platform, we focus on two key aspects: How to increase transparency and how to introduce enhanced co-decision-making. To this end, we exploit the immutable nature of blockchains and effectively offer a platform that excludes monopolistic control over information. The decision-making process is governed by smart contracts implementing, for example, timestamping of planning documents, opinion polls, and the management of a participatory budget. Our architecture and prototypes show the operational capabilities of this approach in a series of use cases for urban development.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114281955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00016
S. Sellars, John Graham, D. Mishin, Kyle Marcus, I. Altintas, T. DeFanti, L. Smarr, Camille Crittenden, F. Wuerthwein, Joulien Tatar, P. Nguyen, E. Shearer, S. Sorooshian, F. M. Ralph
In 2016, a team of earth scientists directly engaged a team of computer scientists to identify cyberinfrastructure (CI) approaches that would speed up an earth science workflow. This paper describes the evolution of that workflow as the two teams bridged CI and an image segmentation algorithm to do large scale earth science research. The Pacific Research Platform (PRP) and The Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) resources were used to significantly decreased the earth science workflow's wall-clock time from 19.5 days to 53 minutes. The improvement in wall-clock time comes from the use of network appliances, improved image segmentation, deployment of a containerized workflow, and the increase in CI experience and training for the earth scientists. This paper presents a description of the evolving innovations used to improve the workflow, bottlenecks identified within each workflow version, and improvements made within each version of the workflow, over a three-year time period.
{"title":"The Evolution of Bits and Bottlenecks in a Scientific Workflow Trying to Keep Up with Technology: Accelerating 4D Image Segmentation Applied to NASA Data","authors":"S. Sellars, John Graham, D. Mishin, Kyle Marcus, I. Altintas, T. DeFanti, L. Smarr, Camille Crittenden, F. Wuerthwein, Joulien Tatar, P. Nguyen, E. Shearer, S. Sorooshian, F. M. Ralph","doi":"10.1109/eScience.2019.00016","DOIUrl":"https://doi.org/10.1109/eScience.2019.00016","url":null,"abstract":"In 2016, a team of earth scientists directly engaged a team of computer scientists to identify cyberinfrastructure (CI) approaches that would speed up an earth science workflow. This paper describes the evolution of that workflow as the two teams bridged CI and an image segmentation algorithm to do large scale earth science research. The Pacific Research Platform (PRP) and The Cognitive Hardware and Software Ecosystem Community Infrastructure (CHASE-CI) resources were used to significantly decreased the earth science workflow's wall-clock time from 19.5 days to 53 minutes. The improvement in wall-clock time comes from the use of network appliances, improved image segmentation, deployment of a containerized workflow, and the increase in CI experience and training for the earth scientists. This paper presents a description of the evolving innovations used to improve the workflow, bottlenecks identified within each workflow version, and improvements made within each version of the workflow, over a three-year time period.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130908524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}