Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00122
J. Weismüller, A. Frank
ICT technologies play an increasing role in almost every aspect of science. The adaption of the new technologies however consume an increasing amount of the researcher's time, time they could better spend on their actual research. Not adapting new technologies however will inevitably lead to biased research, since scientists will not know about all the possibilities and methods that are available from modern technology. This dilemma can only be resolved by close collaboration and scientific partnership between researchers and IT experts from i.e. a local computing centre. In contrast to traditional IT service provision, IT experts have to understand the scientific problems and methods of the scientists in order to help them to select suitable services. If none are available, they can then consider adapting existing services or develop new ones according to the actual needs of the scientists. In addition, the partnership helps towards good scientific practice, since the IT experts can ensure reproducibility of the research by professionalising the workflow and applying FAIR data principles. We elaborate on this dilemma with examples from an IT centre's perspective, and sketch a path towards unbiased research and the development of new IT services that are tailored for the scientific community.
{"title":"Scientific Partnership: A Pledge for a New Level of Collaboration between Scientists and IT Specialists","authors":"J. Weismüller, A. Frank","doi":"10.1109/eScience.2018.00122","DOIUrl":"https://doi.org/10.1109/eScience.2018.00122","url":null,"abstract":"ICT technologies play an increasing role in almost every aspect of science. The adaption of the new technologies however consume an increasing amount of the researcher's time, time they could better spend on their actual research. Not adapting new technologies however will inevitably lead to biased research, since scientists will not know about all the possibilities and methods that are available from modern technology. This dilemma can only be resolved by close collaboration and scientific partnership between researchers and IT experts from i.e. a local computing centre. In contrast to traditional IT service provision, IT experts have to understand the scientific problems and methods of the scientists in order to help them to select suitable services. If none are available, they can then consider adapting existing services or develop new ones according to the actual needs of the scientists. In addition, the partnership helps towards good scientific practice, since the IT experts can ensure reproducibility of the research by professionalising the workflow and applying FAIR data principles. We elaborate on this dilemma with examples from an IT centre's perspective, and sketch a path towards unbiased research and the development of new IT services that are tailored for the scientific community.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"1 1","pages":"402-402"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86862624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00111
B. Thage, L. K. Andersen
Digital competences, such as advanced scientific computing and data mining tools, are often anchored in domain specific research areas. There is a substantial overlap of data types generated from the different fields of science, and hence there is a possibility for sharing knowledge and tools across disciplins. Mapping of software programming tools and identification of use cases that includes High Performance Computing (HPC) may serve as inspiration in order to bring new value of domain specific data in an interdisciplinary way. This poster share the experience from Denmark based on software mapping in 290 publications that included use of HPC.
{"title":"How to Bring Value of Domain Specific Big Data in an Interdisciplinary Way? A Software Landscape","authors":"B. Thage, L. K. Andersen","doi":"10.1109/eScience.2018.00111","DOIUrl":"https://doi.org/10.1109/eScience.2018.00111","url":null,"abstract":"Digital competences, such as advanced scientific computing and data mining tools, are often anchored in domain specific research areas. There is a substantial overlap of data types generated from the different fields of science, and hence there is a possibility for sharing knowledge and tools across disciplins. Mapping of software programming tools and identification of use cases that includes High Performance Computing (HPC) may serve as inspiration in order to bring new value of domain specific data in an interdisciplinary way. This poster share the experience from Denmark based on software mapping in 290 publications that included use of HPC.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"26 1","pages":"384-385"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83224295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00085
A. Poyda, M. Titov, A. Klimentov, J. Wells, S. Oral, K. De, D. Oleynik, S. Jha
The analysis of the hundreds of petabytes of raw and derived HEP (High Energy Physics) data will necessitate exascale computing. In addition to unprecedented volume, these data are distributed over hundreds of computing centers. In response to these application requirement, as well as performance requirement by using parallel processing (i.e., parallelism), and as a consequence of technology trends, there has been an increase in the uptake of supercomputers by HEP projects.
{"title":"Modeling Impact of Execution Strategies on Resource Utilization","authors":"A. Poyda, M. Titov, A. Klimentov, J. Wells, S. Oral, K. De, D. Oleynik, S. Jha","doi":"10.1109/eScience.2018.00085","DOIUrl":"https://doi.org/10.1109/eScience.2018.00085","url":null,"abstract":"The analysis of the hundreds of petabytes of raw and derived HEP (High Energy Physics) data will necessitate exascale computing. In addition to unprecedented volume, these data are distributed over hundreds of computing centers. In response to these application requirement, as well as performance requirement by using parallel processing (i.e., parallelism), and as a consequence of technology trends, there has been an increase in the uptake of supercomputers by HEP projects.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"126 1","pages":"340-340"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87815811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00090
M. Hildreth, E. Sexton-Kennedy, K. Pedro, M. Kortelainen
The LHC simulation frameworks are already confronting the High Luminosity LHC (HL-LHC) era. In order to design and evaluate the performance of the HL-LHC detector upgrades, realistic simulations of the future detectors and the extreme luminosity conditions they may encounter have to be simulated now. The use of many individual minimum-bias interactions to model the pileup poses several challenges to the CMS Simulation framework, including huge memory consumption, increased computation time, and the necessary handling of large numbers of event files during Monte Carlo production. Simulating a single hard scatter at an instantaneous luminosity corresponding to 200 pileup interactions per crossing can involve the input of thousands of individual minimum-bias events. Brute-force Monte Carlo production requires the overlay of these events for each hard-scatter event simulated.
{"title":"Strategies for Modeling Extreme Luminosities in the CMS Simulation","authors":"M. Hildreth, E. Sexton-Kennedy, K. Pedro, M. Kortelainen","doi":"10.1109/eScience.2018.00090","DOIUrl":"https://doi.org/10.1109/eScience.2018.00090","url":null,"abstract":"The LHC simulation frameworks are already confronting the High Luminosity LHC (HL-LHC) era. In order to design and evaluate the performance of the HL-LHC detector upgrades, realistic simulations of the future detectors and the extreme luminosity conditions they may encounter have to be simulated now. The use of many individual minimum-bias interactions to model the pileup poses several challenges to the CMS Simulation framework, including huge memory consumption, increased computation time, and the necessary handling of large numbers of event files during Monte Carlo production. Simulating a single hard scatter at an instantaneous luminosity corresponding to 200 pileup interactions per crossing can involve the input of thousands of individual minimum-bias events. Brute-force Monte Carlo production requires the overlay of these events for each hard-scatter event simulated.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"1 1","pages":"347-347"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83050970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00043
F. Jansson, G. Oord, P. Siebesma, D. Crommelin
n/a
N/A
{"title":"Resolving Clouds in a Global Atmosphere Model - A Multiscale Approach with Nested Models","authors":"F. Jansson, G. Oord, P. Siebesma, D. Crommelin","doi":"10.1109/eScience.2018.00043","DOIUrl":"https://doi.org/10.1109/eScience.2018.00043","url":null,"abstract":"n/a","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"27 1","pages":"270-270"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87460048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00076
Y. Cheah, Drew Paine, D. Ghoshal, L. Ramakrishnan
Qualitative user research is a human-intensive approach that draws upon ethnographic methods from social sciences to develop insights about work practices to inform software design and development. Recent advances in data science, and in particular, natural language processing (NLP), enables the derivation of machine-generated insights to augment existing techniques. Our work describes our prototype framework based in Jupyter, a software tool that supports interactive data science and scientific computing, that leverages NLP techniques to make sense of transcribed texts from user interviews. This work also serves as a starting point for incorporating data science techniques in the qualitative analyses process.
{"title":"Bringing Data Science to Qualitative Analysis","authors":"Y. Cheah, Drew Paine, D. Ghoshal, L. Ramakrishnan","doi":"10.1109/eScience.2018.00076","DOIUrl":"https://doi.org/10.1109/eScience.2018.00076","url":null,"abstract":"Qualitative user research is a human-intensive approach that draws upon ethnographic methods from social sciences to develop insights about work practices to inform software design and development. Recent advances in data science, and in particular, natural language processing (NLP), enables the derivation of machine-generated insights to augment existing techniques. Our work describes our prototype framework based in Jupyter, a software tool that supports interactive data science and scientific computing, that leverages NLP techniques to make sense of transcribed texts from user interviews. This work also serves as a starting point for incorporating data science techniques in the qualitative analyses process.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"24 1","pages":"325-326"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82203398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00049
R. V. Haren, S. Koopmans, G. Steeneveld, N. Theeuwes, R. Uijlenhoet, A. Holtslag
In this study, we improve the Weather Research and Forecasting mesoscale model (WRF) performance by incorporating observations of a variety of sources using data assimilation and nudging techniques on a resolution up to 100 meter for urban areas. Our final goal is to create a 15 year climatological urban re-analysis data archive of (hydro)meteorological variables for Amsterdam which is named ERA-urban. This will enable us to trace trends in thermal comfort and extreme precipitation.
{"title":"Weather Reanalysis on an Urban Scale using WRF","authors":"R. V. Haren, S. Koopmans, G. Steeneveld, N. Theeuwes, R. Uijlenhoet, A. Holtslag","doi":"10.1109/eScience.2018.00049","DOIUrl":"https://doi.org/10.1109/eScience.2018.00049","url":null,"abstract":"In this study, we improve the Weather Research and Forecasting mesoscale model (WRF) performance by incorporating observations of a variety of sources using data assimilation and nudging techniques on a resolution up to 100 meter for urban areas. Our final goal is to create a 15 year climatological urban re-analysis data archive of (hydro)meteorological variables for Amsterdam which is named ERA-urban. This will enable us to trace trends in thermal comfort and extreme precipitation.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"97 1","pages":"279-280"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76555339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00012
C. Jones, A. Kyffin, G. Poulter
This short paper shares the experience of running a collaborative software development platform for a decade and examines what the analysis of the usage shows about the software development practices of the community that used it and the lessons learnt from supporting this community.
{"title":"Reflections from a Decade of Running CCPForge","authors":"C. Jones, A. Kyffin, G. Poulter","doi":"10.1109/eScience.2018.00012","DOIUrl":"https://doi.org/10.1109/eScience.2018.00012","url":null,"abstract":"This short paper shares the experience of running a collaborative software development platform for a decade and examines what the analysis of the usage shows about the software development practices of the community that used it and the lessons learnt from supporting this community.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"100 1","pages":"21-22"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89436720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00040
Tyler J. Skluzacek, Rohan Kumar, Ryan Chard, Galen Harrison, Paul Beckman, K. Chard, Ian T Foster
To mitigate the effects of high-velocity data expansion and to automate the organization of filesystems and data repositories, we have developed Skluma-a system that automatically processes a target filesystem or repository, extracts content-and context-based metadata, and organizes extracted metadata for subsequent use. Skluma is able to extract diverse metadata, including aggregate values derived from embedded structured data; named entities and latent topics buried within free-text documents; and content encoded in images. Skluma implements an overarching probabilistic pipeline to extract increasingly specific metadata from files. It applies machine learning methods to determine file types, dynamically prioritizes and then executes a suite of metadata extractors, and explores contextual metadata based on relationships among files. The derived metadata, represented in JSON, describes probabilistic knowledge of each file that may be subsequently used for discovery or organization. Skluma's architecture enables it to be deployed both locally and used as an on-demand, cloud-hosted service to create and execute dynamic extraction workflows on massive numbers of files. It is modular and extensible-allowing users to contribute their own specialized metadata extractors. Thus far we have tested Skluma on local filesystems, remote FTP-accessible servers, and publicly-accessible Globus endpoints. We have demonstrated its efficacy by applying it to a scientific environmental data repository of more than 500,000 files. We show that we can extract metadata from those files with modest cloud costs in a few hours.
{"title":"Skluma: An Extensible Metadata Extraction Pipeline for Disorganized Data","authors":"Tyler J. Skluzacek, Rohan Kumar, Ryan Chard, Galen Harrison, Paul Beckman, K. Chard, Ian T Foster","doi":"10.1109/eScience.2018.00040","DOIUrl":"https://doi.org/10.1109/eScience.2018.00040","url":null,"abstract":"To mitigate the effects of high-velocity data expansion and to automate the organization of filesystems and data repositories, we have developed Skluma-a system that automatically processes a target filesystem or repository, extracts content-and context-based metadata, and organizes extracted metadata for subsequent use. Skluma is able to extract diverse metadata, including aggregate values derived from embedded structured data; named entities and latent topics buried within free-text documents; and content encoded in images. Skluma implements an overarching probabilistic pipeline to extract increasingly specific metadata from files. It applies machine learning methods to determine file types, dynamically prioritizes and then executes a suite of metadata extractors, and explores contextual metadata based on relationships among files. The derived metadata, represented in JSON, describes probabilistic knowledge of each file that may be subsequently used for discovery or organization. Skluma's architecture enables it to be deployed both locally and used as an on-demand, cloud-hosted service to create and execute dynamic extraction workflows on massive numbers of files. It is modular and extensible-allowing users to contribute their own specialized metadata extractors. Thus far we have tested Skluma on local filesystems, remote FTP-accessible servers, and publicly-accessible Globus endpoints. We have demonstrated its efficacy by applying it to a scientific environmental data repository of more than 500,000 files. We show that we can extract metadata from those files with modest cloud costs in a few hours.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"115 1","pages":"256-266"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80854311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00124
A. Ballis, A. Boizet, Leonardo Candela, D. Castelli, E. Fernández, M. Filter, T. Günther, G. Kakaletris, P. Karampiperis, Dimitris Katris, R. Knapen, R. Lokers, L. Penev, G. Sipos, P. Zervas
Agri-food research calls for changes in the practices dealing with data collection, collation, processing and analytics, and publishing thus to fully benefit from and contribute to the Open Science movement. One of the major issues faced by the agri-food researchers is the fragmentation of the "assets" that can be exploited when performing research tasks, e.g. data of interest are heterogeneous and scattered across several repositories, the tools exploited by modellers are diverse and often rely on local computing environments, the publishing practices are various and rarely aim at making available the "whole story" with datasets, processes, workflows. This paper presents the AGINFRA+ endeavour to overcome these limitations by providing researchers in three designated communities with Virtual Research Environments facilitating the access to and use of the "assets" of interest and promote collaboration.
{"title":"Serving Scientists in Agri-Food Area by Virtual Research Environments","authors":"A. Ballis, A. Boizet, Leonardo Candela, D. Castelli, E. Fernández, M. Filter, T. Günther, G. Kakaletris, P. Karampiperis, Dimitris Katris, R. Knapen, R. Lokers, L. Penev, G. Sipos, P. Zervas","doi":"10.1109/eScience.2018.00124","DOIUrl":"https://doi.org/10.1109/eScience.2018.00124","url":null,"abstract":"Agri-food research calls for changes in the practices dealing with data collection, collation, processing and analytics, and publishing thus to fully benefit from and contribute to the Open Science movement. One of the major issues faced by the agri-food researchers is the fragmentation of the \"assets\" that can be exploited when performing research tasks, e.g. data of interest are heterogeneous and scattered across several repositories, the tools exploited by modellers are diverse and often rely on local computing environments, the publishing practices are various and rarely aim at making available the \"whole story\" with datasets, processes, workflows. This paper presents the AGINFRA+ endeavour to overcome these limitations by providing researchers in three designated communities with Virtual Research Environments facilitating the access to and use of the \"assets\" of interest and promote collaboration.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"29 1","pages":"405-406"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75415537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}