Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00080
C. Pagé, W. S. D. Cerff, M. Plieger, A. Spinuso, Xavier Pivan
Easier access to climate data is very important for the climate change impact research communities. Many aspects are important for those users, such as extensive guidance, transparent access to datasets, on-demand processing capabilities (notably for data reduction). To fulfill this objective, the climate4impact (http://climate4impact.eu/) web portal and services has been developed in the European Union funded IS-ENES projects, targeting climate change impact modellers, impact and adaptation consultants, as well as other experts using climate change data. It provides to users harmonized access to climate model data through tailored services. One of the main objectives of climate4impact is to provide standardized web services and tools that are reusable in other portals. These services include web processing services, web coverage services and web mapping services. Tailored portals can be targeted to specific communities and/or countries/regions while making use of those services. Recently, it became obvious that to fulfill users' needs regarding on-demand data processing and calculations, the climate4impact platform had to be able to use existing research and e-infrastructures in order to offer scalable and flexible services. This is especially true in the current context of a large increase in the data volumes of climate science datasets. To easily accommodate heterogeneous systems, a containerized and modular approach is envisioned. Finally, in the context of data processing delegation, a robust approach for metadata, provenance and lineage is required.
{"title":"Ease Access to Climate Simulations for Researchers: IS-ENES Climate4Impact","authors":"C. Pagé, W. S. D. Cerff, M. Plieger, A. Spinuso, Xavier Pivan","doi":"10.1109/eScience.2019.00080","DOIUrl":"https://doi.org/10.1109/eScience.2019.00080","url":null,"abstract":"Easier access to climate data is very important for the climate change impact research communities. Many aspects are important for those users, such as extensive guidance, transparent access to datasets, on-demand processing capabilities (notably for data reduction). To fulfill this objective, the climate4impact (http://climate4impact.eu/) web portal and services has been developed in the European Union funded IS-ENES projects, targeting climate change impact modellers, impact and adaptation consultants, as well as other experts using climate change data. It provides to users harmonized access to climate model data through tailored services. One of the main objectives of climate4impact is to provide standardized web services and tools that are reusable in other portals. These services include web processing services, web coverage services and web mapping services. Tailored portals can be targeted to specific communities and/or countries/regions while making use of those services. Recently, it became obvious that to fulfill users' needs regarding on-demand data processing and calculations, the climate4impact platform had to be able to use existing research and e-infrastructures in order to offer scalable and flexible services. This is especially true in the current context of a large increase in the data volumes of climate science datasets. To easily accommodate heterogeneous systems, a containerized and modular approach is envisioned. Finally, in the context of data processing delegation, a robust approach for metadata, provenance and lineage is required.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"34 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123365713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00087
S. Dasgupta, A. Bagchi, Amarnath Gupta
Ingesting high-speed streaming data from social media into a graph database must overcome three problems – 1) the data can be really bursty, 2) the data must be transformed into a graph and 3) the graph database may not be able to ingest high-burst, high-velocity data. We have developed an adaptive buffering mechanism and a graph compression technique that effectively mitigate the problem.
{"title":"Streaming Graph Ingestion with Resource-Aware Buffering and Graph Compression","authors":"S. Dasgupta, A. Bagchi, Amarnath Gupta","doi":"10.1109/eScience.2019.00087","DOIUrl":"https://doi.org/10.1109/eScience.2019.00087","url":null,"abstract":"Ingesting high-speed streaming data from social media into a graph database must overcome three problems – 1) the data can be really bursty, 2) the data must be transformed into a graph and 3) the graph database may not be able to ingest high-burst, high-velocity data. We have developed an adaptive buffering mechanism and a graph compression technique that effectively mitigate the problem.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"7 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126101438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00025
Haifa AlQuwaiee, Songlin He, C. Wu, Qiang Tang, Xuewen Shen
Modern big data computing systems exemplified by Hadoop employ parallel processing based on distributed storage. The results produced by parallel tasks such as computing modules in scientific workflows or reducers in the MapReduce framework are typically stored in a distributed file system across multiple data nodes. However, most existing systems do not provide a mechanism to compose such distributed information, as required by many big data applications. We construct analytical cost models and formulate a Distributed Information Composition problem in Big Data Systems, referred to as DIC-BDS, to aggregate multiple datasets stored as data blocks in Hadoop Distributed File System (HDFS) using a composition operator of specific complexity to produce one final output. We rigorously prove that DIC-BDS is NP-complete, and propose two heuristic algorithms: Fixed-window Distributed Composition Scheme (FDCS) and Dynamic-window Distributed Composition Scheme with Delay (DDCS-D). We conduct extensive experiments in Google clouds with various composition operators of commonly considered degrees of complexity including O(n), O(n log n), and O(n^2). Experimental results illustrate the performance superiority of the proposed solutions over existing methods. Specifically, FDCS outperforms all other algorithms in comparison with a composition operator of complexity O(n) or O(n log n), while DDCS-D achieves the minimum total composition time with a composition operator of complexity O(n^2). These algorithms provide an additional level of data processing for efficient information aggregation in existing workflow and big data systems.
{"title":"On Distributed Information Composition in Big Data Systems","authors":"Haifa AlQuwaiee, Songlin He, C. Wu, Qiang Tang, Xuewen Shen","doi":"10.1109/eScience.2019.00025","DOIUrl":"https://doi.org/10.1109/eScience.2019.00025","url":null,"abstract":"Modern big data computing systems exemplified by Hadoop employ parallel processing based on distributed storage. The results produced by parallel tasks such as computing modules in scientific workflows or reducers in the MapReduce framework are typically stored in a distributed file system across multiple data nodes. However, most existing systems do not provide a mechanism to compose such distributed information, as required by many big data applications. We construct analytical cost models and formulate a Distributed Information Composition problem in Big Data Systems, referred to as DIC-BDS, to aggregate multiple datasets stored as data blocks in Hadoop Distributed File System (HDFS) using a composition operator of specific complexity to produce one final output. We rigorously prove that DIC-BDS is NP-complete, and propose two heuristic algorithms: Fixed-window Distributed Composition Scheme (FDCS) and Dynamic-window Distributed Composition Scheme with Delay (DDCS-D). We conduct extensive experiments in Google clouds with various composition operators of commonly considered degrees of complexity including O(n), O(n log n), and O(n^2). Experimental results illustrate the performance superiority of the proposed solutions over existing methods. Specifically, FDCS outperforms all other algorithms in comparison with a composition operator of complexity O(n) or O(n log n), while DDCS-D achieves the minimum total composition time with a composition operator of complexity O(n^2). These algorithms provide an additional level of data processing for efficient information aggregation in existing workflow and big data systems.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129575218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00062
Florian Berberich, Janina Liebmann, J. Nominé, Oriol Pineda, Philippe Segers, Veronica Teodor
This paper provides an overview on the European HPC landscape supported by a survey, designed by the PRACE-5IP project, accessing more than 80 of the most influential stakeholders of HPC in Europe. It focuses on Tier-0 systems on a European level providing high-end computing and data analysis resources. The different actors are presented and their provided services are analyzed in order to identify overlaps and gaps, complementarity and opportunities for collaborations. A new pan-European HPC portal is proposed in order to get all information on one place and access the different services.
{"title":"European HPC Landscape","authors":"Florian Berberich, Janina Liebmann, J. Nominé, Oriol Pineda, Philippe Segers, Veronica Teodor","doi":"10.1109/eScience.2019.00062","DOIUrl":"https://doi.org/10.1109/eScience.2019.00062","url":null,"abstract":"This paper provides an overview on the European HPC landscape supported by a survey, designed by the PRACE-5IP project, accessing more than 80 of the most influential stakeholders of HPC in Europe. It focuses on Tier-0 systems on a European level providing high-end computing and data analysis resources. The different actors are presented and their provided services are analyzed in order to identify overlaps and gaps, complementarity and opportunities for collaborations. A new pan-European HPC portal is proposed in order to get all information on one place and access the different services.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126433476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00015
Eric J. Lyons, A. Mandal, G. Papadimitriou, Cong Wang, Komal Thareja, P. Ruth, J. J. Villalobos, I. Rodero, E. Deelman, M. Zink
Computational science today depends on complex, data-intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist's workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfigurable, end-to-end infrastructure that can increase scientific productivity. However, applications have not adequately taken advantage of these advanced capabilities. In this work, we have developed a novel network-centric platform that enables high-performance, adaptive data flows and coordinated access to distributed cloud resources and data repositories for atmospheric scientists. We demonstrate the effectiveness of our approach by evaluating time-critical, adaptive weather sensing workflows, which utilize advanced networked infrastructure to ingest live weather data from radars and compute data products used for timely response to weather events. The workflows are orchestrated by the Pegasus workflow management system and were chosen because of their diverse resource requirements. We show that our approach results in timely processing of Nowcast workflows under different infrastructure configurations and network conditions. We also show how workflow task clustering choices affect throughput of an ensemble of Nowcast workflows with improved turnaround times. Additionally, we find that using our network-centric platform powered by advanced layer2 networking techniques results in faster, more reliable data throughput, makes cloud resources easier to provision, and the workflows easier to configure for operational use and automation.
{"title":"Toward a Dynamic Network-Centric Distributed Cloud Platform for Scientific Workflows: A Case Study for Adaptive Weather Sensing","authors":"Eric J. Lyons, A. Mandal, G. Papadimitriou, Cong Wang, Komal Thareja, P. Ruth, J. J. Villalobos, I. Rodero, E. Deelman, M. Zink","doi":"10.1109/eScience.2019.00015","DOIUrl":"https://doi.org/10.1109/eScience.2019.00015","url":null,"abstract":"Computational science today depends on complex, data-intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist's workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfigurable, end-to-end infrastructure that can increase scientific productivity. However, applications have not adequately taken advantage of these advanced capabilities. In this work, we have developed a novel network-centric platform that enables high-performance, adaptive data flows and coordinated access to distributed cloud resources and data repositories for atmospheric scientists. We demonstrate the effectiveness of our approach by evaluating time-critical, adaptive weather sensing workflows, which utilize advanced networked infrastructure to ingest live weather data from radars and compute data products used for timely response to weather events. The workflows are orchestrated by the Pegasus workflow management system and were chosen because of their diverse resource requirements. We show that our approach results in timely processing of Nowcast workflows under different infrastructure configurations and network conditions. We also show how workflow task clustering choices affect throughput of an ensemble of Nowcast workflows with improved turnaround times. Additionally, we find that using our network-centric platform powered by advanced layer2 networking techniques results in faster, more reliable data throughput, makes cloud resources easier to provision, and the workflows easier to configure for operational use and automation.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"7 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127367950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00092
Cas Fahrenfort, Zhiming Zhao
FAIRness (findability, accessibility, interoperability and re-usability) is crucial for enabling open science and innovation based on digital objects from large communities of providers and users. However, the gaps among version control, identification and distributed access systems often make the scalability of data centric applications difficult across large user communities and highly distributed infrastructures. This poster proposes a solution for accessing and sharing digital objects over a networked environment using Digital object interface protocol (DOIP) and Named Data Networking (NDN).
{"title":"Effective Digital Object Access and Sharing Over a Networked Environment using DOIP and NDN","authors":"Cas Fahrenfort, Zhiming Zhao","doi":"10.1109/eScience.2019.00092","DOIUrl":"https://doi.org/10.1109/eScience.2019.00092","url":null,"abstract":"FAIRness (findability, accessibility, interoperability and re-usability) is crucial for enabling open science and innovation based on digital objects from large communities of providers and users. However, the gaps among version control, identification and distributed access systems often make the scalability of data centric applications difficult across large user communities and highly distributed infrastructures. This poster proposes a solution for accessing and sharing digital objects over a networked environment using Digital object interface protocol (DOIP) and Named Data Networking (NDN).","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131644000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00057
G. Fox, S. Jha
We present a taxonomy of research on Machine Learning (ML) applied to enhance simulations together with a catalog of some activities. We cover eight patterns for the link of ML to the simulations or systems plus three algorithmic areas: particle dynamics, agent-based models and partial differential equations. The patterns are further divided into three action areas: Improving simulation with Configurations and Integration of Data, Learn Structure, Theory and Model for Simulation, and Learn to make Surrogates.
{"title":"Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations","authors":"G. Fox, S. Jha","doi":"10.1109/eScience.2019.00057","DOIUrl":"https://doi.org/10.1109/eScience.2019.00057","url":null,"abstract":"We present a taxonomy of research on Machine Learning (ML) applied to enhance simulations together with a catalog of some activities. We cover eight patterns for the link of ML to the simulations or systems plus three algorithmic areas: particle dynamics, agent-based models and partial differential equations. The patterns are further divided into three action areas: Improving simulation with Configurations and Integration of Data, Learn Structure, Theory and Model for Simulation, and Learn to make Surrogates.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115628006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00046
D. Garijo, Maximiliano Osorio, D. Khider, V. Ratnakar, Y. Gil
Scientific software is crucial for understanding, reusing and reproducing results in computational sciences. Software is often stored in code repositories, which may contain human readable instructions necessary to use it and set it up. However, a significant amount of time is usually required to understand how to invoke a software component, prepare data in the format it requires, and use it in combination with other software. In this paper we introduce OKG-Soft, an open knowledge graph that describes scientific software in a machine readable manner. OKG-Soft includes: 1) an ontology designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework to annotate, query, explore and curate scientific software metadata. OKG-Soft supports the FAIR principles of findability, accessibility, interoperability, and reuse for software. We demonstrate the benefits of OKG-Soft with two applications: a browser for understanding scientific models in the environmental and social sciences, and a portal to combine climate, hydrology, agriculture, and economic software models.
在计算科学中,科学软件对于理解、重用和再现结果至关重要。软件通常存储在代码库中,其中可能包含使用和设置软件所需的人类可读指令。然而,通常需要大量的时间来理解如何调用软件组件,以所需的格式准备数据,并将其与其他软件结合使用。本文介绍了一种以机器可读方式描述科学软件的开放式知识图谱OKG-Soft。OKG-Soft包括:1)用于描述软件及其使用的特定数据格式的本体;2)一种将软件元数据作为开放知识图发布的方法,并与其他Web of Data对象相链接;3)一个对科学软件元数据进行标注、查询、挖掘和管理的框架。OKG-Soft支持软件的可查找性、可访问性、互操作性和重用性的FAIR原则。我们通过两个应用程序展示了OKG-Soft的好处:一个用于理解环境和社会科学科学模型的浏览器,以及一个结合气候、水文、农业和经济软件模型的门户。
{"title":"OKG-Soft: An Open Knowledge Graph with Machine Readable Scientific Software Metadata","authors":"D. Garijo, Maximiliano Osorio, D. Khider, V. Ratnakar, Y. Gil","doi":"10.1109/eScience.2019.00046","DOIUrl":"https://doi.org/10.1109/eScience.2019.00046","url":null,"abstract":"Scientific software is crucial for understanding, reusing and reproducing results in computational sciences. Software is often stored in code repositories, which may contain human readable instructions necessary to use it and set it up. However, a significant amount of time is usually required to understand how to invoke a software component, prepare data in the format it requires, and use it in combination with other software. In this paper we introduce OKG-Soft, an open knowledge graph that describes scientific software in a machine readable manner. OKG-Soft includes: 1) an ontology designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework to annotate, query, explore and curate scientific software metadata. OKG-Soft supports the FAIR principles of findability, accessibility, interoperability, and reuse for software. We demonstrate the benefits of OKG-Soft with two applications: a browser for understanding scientific models in the environmental and social sciences, and a portal to combine climate, hydrology, agriculture, and economic software models.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121922555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00071
M. Musen
Discussions regarding open science have circulated in the scientific community for many years. The articulation of the FAIR principles in 2016, however, led to a groundswell of excitement to make experimental data findable, accessible, interoperable, and reusable. The FAIR acronym is catchy and easy to remember. The 15 FAIR principles, however, are not. Efforts to enhance access to scientific datasets and to promote their reuse require intuitive tools that implement the FAIR principles as a side effect of their use. The CEDAR Workbench is one such tool that simplifies the authoring of standardized, comprehensive metadata to make datasets FAIR. Systems such as the CEDAR Workbench, which renders datasets FAIR in a transparent fashion, can enhance open science as a direct byproduct of their use. Current projects that have adopted the CEDAR Workbench provide an opportunity to assess how well knowledge technologies can facilitate the creation of FAIR data.
{"title":"Making Data FAIR Requires More than Just Principles: We Need Knowledge Technologies","authors":"M. Musen","doi":"10.1109/eScience.2019.00071","DOIUrl":"https://doi.org/10.1109/eScience.2019.00071","url":null,"abstract":"Discussions regarding open science have circulated in the scientific community for many years. The articulation of the FAIR principles in 2016, however, led to a groundswell of excitement to make experimental data findable, accessible, interoperable, and reusable. The FAIR acronym is catchy and easy to remember. The 15 FAIR principles, however, are not. Efforts to enhance access to scientific datasets and to promote their reuse require intuitive tools that implement the FAIR principles as a side effect of their use. The CEDAR Workbench is one such tool that simplifies the authoring of standardized, comprehensive metadata to make datasets FAIR. Systems such as the CEDAR Workbench, which renders datasets FAIR in a transparent fashion, can enhance open science as a direct byproduct of their use. Current projects that have adopted the CEDAR Workbench provide an opportunity to assess how well knowledge technologies can facilitate the creation of FAIR data.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125021990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00037
M. Wolf, J. Dominski, G. Merlo, J. Choi, G. Eisenhauer, S. Ethier, K. Huck, S. Klasky, Jeremy S. Logan, A. Malony, Chad Wood
Part of the promise of exascale computing and the next generation of scientific simulation codes is the ability to bring together time and spatial scales that have traditionally been treated separately. This enables creating complex coupled simulations and in situ analysis pipelines, encompassing such things as "whole device" fusion models or the simulation of cities from sewers to rooftops. Unfortunately, the HPC analysis tools that have been built up over the preceding decades are ill suited to the debugging and performance analysis of such computational ensembles. In this paper, we present a new vision for performance measurement and understanding of HPC codes, MonitoringAnalytics (MONA). MONA is designed to be a flexible, high performance monitoring infrastructure that can perform monitoring analysis in place or in transit by embedding analytics and characterization directly into the data stream, without relying upon delivering all monitoring information to a central database for post-processing. It addresses the trade-offs between the prohibitively expensive capture of all performance characteristics and not capturing enough to detect the features of interest. We demonstrate several uses of MONA; capturing and indexing multi-executable performance profiles to enable later processing, extraction of performance primitives to enable the generation of customizable benchmarks and performance skeletons, and extracting communication and application behaviors to enable better control and placement for the current and future runs of the science ensemble. Relevant performance information based on a system for MONA built from ADIOS and SOSflow technologies is provided for DOE science applications and leadership machines.
{"title":"Scalable Performance Awareness for In Situ Scientific Applications","authors":"M. Wolf, J. Dominski, G. Merlo, J. Choi, G. Eisenhauer, S. Ethier, K. Huck, S. Klasky, Jeremy S. Logan, A. Malony, Chad Wood","doi":"10.1109/eScience.2019.00037","DOIUrl":"https://doi.org/10.1109/eScience.2019.00037","url":null,"abstract":"Part of the promise of exascale computing and the next generation of scientific simulation codes is the ability to bring together time and spatial scales that have traditionally been treated separately. This enables creating complex coupled simulations and in situ analysis pipelines, encompassing such things as \"whole device\" fusion models or the simulation of cities from sewers to rooftops. Unfortunately, the HPC analysis tools that have been built up over the preceding decades are ill suited to the debugging and performance analysis of such computational ensembles. In this paper, we present a new vision for performance measurement and understanding of HPC codes, MonitoringAnalytics (MONA). MONA is designed to be a flexible, high performance monitoring infrastructure that can perform monitoring analysis in place or in transit by embedding analytics and characterization directly into the data stream, without relying upon delivering all monitoring information to a central database for post-processing. It addresses the trade-offs between the prohibitively expensive capture of all performance characteristics and not capturing enough to detect the features of interest. We demonstrate several uses of MONA; capturing and indexing multi-executable performance profiles to enable later processing, extraction of performance primitives to enable the generation of customizable benchmarks and performance skeletons, and extracting communication and application behaviors to enable better control and placement for the current and future runs of the science ensemble. Relevant performance information based on a system for MONA built from ADIOS and SOSflow technologies is provided for DOE science applications and leadership machines.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129915829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}