2019 15th International Conference on eScience (eScience)最新文献

英文中文

Ease Access to Climate Simulations for Researchers: IS-ENES Climate4Impact 为研究人员方便获取气候模拟:IS-ENES Climate4Impact

2019 15th International Conference on eScience (eScience)

Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00080

C. Pagé, W. S. D. Cerff, M. Plieger, A. Spinuso, Xavier Pivan

Easier access to climate data is very important for the climate change impact research communities. Many aspects are important for those users, such as extensive guidance, transparent access to datasets, on-demand processing capabilities (notably for data reduction). To fulfill this objective, the climate4impact (http://climate4impact.eu/) web portal and services has been developed in the European Union funded IS-ENES projects, targeting climate change impact modellers, impact and adaptation consultants, as well as other experts using climate change data. It provides to users harmonized access to climate model data through tailored services. One of the main objectives of climate4impact is to provide standardized web services and tools that are reusable in other portals. These services include web processing services, web coverage services and web mapping services. Tailored portals can be targeted to specific communities and/or countries/regions while making use of those services. Recently, it became obvious that to fulfill users' needs regarding on-demand data processing and calculations, the climate4impact platform had to be able to use existing research and e-infrastructures in order to offer scalable and flexible services. This is especially true in the current context of a large increase in the data volumes of climate science datasets. To easily accommodate heterogeneous systems, a containerized and modular approach is envisioned. Finally, in the context of data processing delegation, a robust approach for metadata, provenance and lineage is required.

更容易获得气候数据对于气候变化影响研究界来说是非常重要的。对于这些用户来说，许多方面都很重要，例如广泛的指导、对数据集的透明访问、按需处理能力(特别是数据缩减)。为实现这一目标，在欧盟资助的IS-ENES项目中开发了climate4impact (http://climate4impact.eu/)门户网站和服务，针对气候变化影响建模者、影响和适应顾问以及使用气候变化数据的其他专家。它通过量身定制的服务为用户提供对气候模式数据的统一访问。climate4impact的主要目标之一是提供可在其他门户中重用的标准化web服务和工具。这些服务包括web处理服务、web覆盖服务和web映射服务。量身定制的门户在使用这些服务时可以针对特定的社区和/或国家/地区。最近，很明显，为了满足用户对按需数据处理和计算的需求，climate4impact平台必须能够使用现有的研究和电子基础设施，以提供可扩展和灵活的服务。在气候科学数据集数据量大量增加的当前背景下，这一点尤其正确。为了方便地容纳异构系统，设想了一种容器化和模块化的方法。最后，在数据处理委托的上下文中，需要一种健壮的元数据、来源和沿袭方法。

{"title":"Ease Access to Climate Simulations for Researchers: IS-ENES Climate4Impact","authors":"C. Pagé, W. S. D. Cerff, M. Plieger, A. Spinuso, Xavier Pivan","doi":"10.1109/eScience.2019.00080","DOIUrl":"https://doi.org/10.1109/eScience.2019.00080","url":null,"abstract":"Easier access to climate data is very important for the climate change impact research communities. Many aspects are important for those users, such as extensive guidance, transparent access to datasets, on-demand processing capabilities (notably for data reduction). To fulfill this objective, the climate4impact (http://climate4impact.eu/) web portal and services has been developed in the European Union funded IS-ENES projects, targeting climate change impact modellers, impact and adaptation consultants, as well as other experts using climate change data. It provides to users harmonized access to climate model data through tailored services. One of the main objectives of climate4impact is to provide standardized web services and tools that are reusable in other portals. These services include web processing services, web coverage services and web mapping services. Tailored portals can be targeted to specific communities and/or countries/regions while making use of those services. Recently, it became obvious that to fulfill users' needs regarding on-demand data processing and calculations, the climate4impact platform had to be able to use existing research and e-infrastructures in order to offer scalable and flexible services. This is especially true in the current context of a large increase in the data volumes of climate science datasets. To easily accommodate heterogeneous systems, a containerized and modular approach is envisioned. Finally, in the context of data processing delegation, a robust approach for metadata, provenance and lineage is required.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"34 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123365713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Streaming Graph Ingestion with Resource-Aware Buffering and Graph Compression 具有资源感知缓冲和图形压缩的流图摄取

2019 15th International Conference on eScience (eScience)

Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00087

S. Dasgupta, A. Bagchi, Amarnath Gupta

Ingesting high-speed streaming data from social media into a graph database must overcome three problems – 1) the data can be really bursty, 2) the data must be transformed into a graph and 3) the graph database may not be able to ingest high-burst, high-velocity data. We have developed an adaptive buffering mechanism and a graph compression technique that effectively mitigate the problem.

从社交媒体中摄取高速流数据到图形数据库必须克服三个问题:1)数据可能真的是突发的，2)数据必须转换成图形，3)图形数据库可能无法摄取高突发、高速的数据。我们已经开发了一种自适应缓冲机制和一种图形压缩技术，可以有效地缓解这个问题。

引用次数: 1

On Distributed Information Composition in Big Data Systems 大数据系统中的分布式信息组合研究

2019 15th International Conference on eScience (eScience)

Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00025

Haifa AlQuwaiee, Songlin He, C. Wu, Qiang Tang, Xuewen Shen

Modern big data computing systems exemplified by Hadoop employ parallel processing based on distributed storage. The results produced by parallel tasks such as computing modules in scientific workflows or reducers in the MapReduce framework are typically stored in a distributed file system across multiple data nodes. However, most existing systems do not provide a mechanism to compose such distributed information, as required by many big data applications. We construct analytical cost models and formulate a Distributed Information Composition problem in Big Data Systems, referred to as DIC-BDS, to aggregate multiple datasets stored as data blocks in Hadoop Distributed File System (HDFS) using a composition operator of specific complexity to produce one final output. We rigorously prove that DIC-BDS is NP-complete, and propose two heuristic algorithms: Fixed-window Distributed Composition Scheme (FDCS) and Dynamic-window Distributed Composition Scheme with Delay (DDCS-D). We conduct extensive experiments in Google clouds with various composition operators of commonly considered degrees of complexity including O(n), O(n log n), and O(n^2). Experimental results illustrate the performance superiority of the proposed solutions over existing methods. Specifically, FDCS outperforms all other algorithms in comparison with a composition operator of complexity O(n) or O(n log n), while DDCS-D achieves the minimum total composition time with a composition operator of complexity O(n^2). These algorithms provide an additional level of data processing for efficient information aggregation in existing workflow and big data systems.

以Hadoop为代表的现代大数据计算系统采用了基于分布式存储的并行处理。并行任务(如科学工作流中的计算模块或MapReduce框架中的reducer)产生的结果通常存储在跨多个数据节点的分布式文件系统中。然而，大多数现有系统并没有提供一种机制来组合这种分布式信息，而这正是许多大数据应用程序所需要的。我们构建了分析成本模型，并制定了大数据系统中的分布式信息组合问题(DIC-BDS)，以使用特定复杂性的组合运算符聚合作为数据块存储在Hadoop分布式文件系统(HDFS)中的多个数据集，以产生一个最终输出。严格证明了DIC-BDS是np完全的，并提出了两种启发式算法:固定窗口分布式合成方案(FDCS)和动态窗口延迟分布式合成方案(DDCS-D)。我们在谷歌云中进行了广泛的实验，使用了各种通常认为复杂程度的组合算子，包括O(n)、O(n log n)和O(n^2)。实验结果表明，所提出的方法在性能上优于现有的方法。具体来说，与复杂度为O(n)或O(n log n)的复合算子相比，FDCS优于所有其他算法，而DDCS-D以复杂度为O(n^2)的复合算子实现了最小的总复合时间。这些算法为现有工作流和大数据系统中的有效信息聚合提供了额外的数据处理级别。

{"title":"On Distributed Information Composition in Big Data Systems","authors":"Haifa AlQuwaiee, Songlin He, C. Wu, Qiang Tang, Xuewen Shen","doi":"10.1109/eScience.2019.00025","DOIUrl":"https://doi.org/10.1109/eScience.2019.00025","url":null,"abstract":"Modern big data computing systems exemplified by Hadoop employ parallel processing based on distributed storage. The results produced by parallel tasks such as computing modules in scientific workflows or reducers in the MapReduce framework are typically stored in a distributed file system across multiple data nodes. However, most existing systems do not provide a mechanism to compose such distributed information, as required by many big data applications. We construct analytical cost models and formulate a Distributed Information Composition problem in Big Data Systems, referred to as DIC-BDS, to aggregate multiple datasets stored as data blocks in Hadoop Distributed File System (HDFS) using a composition operator of specific complexity to produce one final output. We rigorously prove that DIC-BDS is NP-complete, and propose two heuristic algorithms: Fixed-window Distributed Composition Scheme (FDCS) and Dynamic-window Distributed Composition Scheme with Delay (DDCS-D). We conduct extensive experiments in Google clouds with various composition operators of commonly considered degrees of complexity including O(n), O(n log n), and O(n^2). Experimental results illustrate the performance superiority of the proposed solutions over existing methods. Specifically, FDCS outperforms all other algorithms in comparison with a composition operator of complexity O(n) or O(n log n), while DDCS-D achieves the minimum total composition time with a composition operator of complexity O(n^2). These algorithms provide an additional level of data processing for efficient information aggregation in existing workflow and big data systems.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129575218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

European HPC Landscape 欧洲高性能计算景观

2019 15th International Conference on eScience (eScience)

Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00062

Florian Berberich, Janina Liebmann, J. Nominé, Oriol Pineda, Philippe Segers, Veronica Teodor

This paper provides an overview on the European HPC landscape supported by a survey, designed by the PRACE-5IP project, accessing more than 80 of the most influential stakeholders of HPC in Europe. It focuses on Tier-0 systems on a European level providing high-end computing and data analysis resources. The different actors are presented and their provided services are analyzed in order to identify overlaps and gaps, complementarity and opportunities for collaborations. A new pan-European HPC portal is proposed in order to get all information on one place and access the different services.

本文通过一项由pace - 5ip项目设计的调查，对欧洲高性能计算领域80多位最有影响力的利益相关者进行了调查，概述了欧洲高性能计算领域的概况。它专注于欧洲级别的Tier-0系统，提供高端计算和数据分析资源。介绍了不同的参与者，并分析了他们提供的服务，以确定重叠和差距、互补性和合作机会。为了在一个地方获得所有信息并访问不同的服务，提出了一个新的泛欧HPC门户。

引用次数: 2

Toward a Dynamic Network-Centric Distributed Cloud Platform for Scientific Workflows: A Case Study for Adaptive Weather Sensing 面向科学工作流的动态以网络为中心的分布式云平台:自适应天气传感的案例研究

2019 15th International Conference on eScience (eScience)

Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00015

Eric J. Lyons, A. Mandal, G. Papadimitriou, Cong Wang, Komal Thareja, P. Ruth, J. J. Villalobos, I. Rodero, E. Deelman, M. Zink

Computational science today depends on complex, data-intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist's workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfigurable, end-to-end infrastructure that can increase scientific productivity. However, applications have not adequately taken advantage of these advanced capabilities. In this work, we have developed a novel network-centric platform that enables high-performance, adaptive data flows and coordinated access to distributed cloud resources and data repositories for atmospheric scientists. We demonstrate the effectiveness of our approach by evaluating time-critical, adaptive weather sensing workflows, which utilize advanced networked infrastructure to ingest live weather data from radars and compute data products used for timely response to weather events. The workflows are orchestrated by the Pegasus workflow management system and were chosen because of their diverse resource requirements. We show that our approach results in timely processing of Nowcast workflows under different infrastructure configurations and network conditions. We also show how workflow task clustering choices affect throughput of an ensemble of Nowcast workflows with improved turnaround times. Additionally, we find that using our network-centric platform powered by advanced layer2 networking techniques results in faster, more reliable data throughput, makes cloud resources easier to provision, and the workflows easier to configure for operational use and automation.

今天的计算科学依赖于复杂的、数据密集型的应用程序，这些应用程序对来自各种科学仪器的数据集进行操作。一个主要的挑战是将数据整合到科学家的工作流程中。动态、网络化云资源的最新进展为构建可重构的端到端基础设施提供了构建块，从而提高科学生产力。然而，应用程序并没有充分利用这些高级功能。在这项工作中，我们开发了一种新颖的以网络为中心的平台，为大气科学家提供高性能、自适应的数据流和对分布式云资源和数据存储库的协调访问。我们通过评估时间关键型、自适应天气传感工作流程来证明我们方法的有效性，该流程利用先进的网络基础设施从雷达获取实时天气数据，并计算用于及时响应天气事件的数据产品。这些工作流是由Pegasus工作流管理系统编排的，选择它们是因为它们的资源需求不同。我们表明，我们的方法可以在不同的基础设施配置和网络条件下及时处理临近预报工作流。我们还展示了工作流任务集群选择如何影响具有改进周转时间的Nowcast工作流集合的吞吐量。此外，我们发现，使用由先进的第二层网络技术提供支持的以网络为中心的平台，可以实现更快、更可靠的数据吞吐量，使云资源更容易提供，并且更容易为操作使用和自动化配置工作流。

{"title":"Toward a Dynamic Network-Centric Distributed Cloud Platform for Scientific Workflows: A Case Study for Adaptive Weather Sensing","authors":"Eric J. Lyons, A. Mandal, G. Papadimitriou, Cong Wang, Komal Thareja, P. Ruth, J. J. Villalobos, I. Rodero, E. Deelman, M. Zink","doi":"10.1109/eScience.2019.00015","DOIUrl":"https://doi.org/10.1109/eScience.2019.00015","url":null,"abstract":"Computational science today depends on complex, data-intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist's workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfigurable, end-to-end infrastructure that can increase scientific productivity. However, applications have not adequately taken advantage of these advanced capabilities. In this work, we have developed a novel network-centric platform that enables high-performance, adaptive data flows and coordinated access to distributed cloud resources and data repositories for atmospheric scientists. We demonstrate the effectiveness of our approach by evaluating time-critical, adaptive weather sensing workflows, which utilize advanced networked infrastructure to ingest live weather data from radars and compute data products used for timely response to weather events. The workflows are orchestrated by the Pegasus workflow management system and were chosen because of their diverse resource requirements. We show that our approach results in timely processing of Nowcast workflows under different infrastructure configurations and network conditions. We also show how workflow task clustering choices affect throughput of an ensemble of Nowcast workflows with improved turnaround times. Additionally, we find that using our network-centric platform powered by advanced layer2 networking techniques results in faster, more reliable data throughput, makes cloud resources easier to provision, and the workflows easier to configure for operational use and automation.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"7 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127367950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Effective Digital Object Access and Sharing Over a Networked Environment using DOIP and NDN 使用DOIP和NDN的网络环境中有效的数字对象访问和共享

2019 15th International Conference on eScience (eScience)

Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00092

Cas Fahrenfort, Zhiming Zhao

FAIRness (findability, accessibility, interoperability and re-usability) is crucial for enabling open science and innovation based on digital objects from large communities of providers and users. However, the gaps among version control, identification and distributed access systems often make the scalability of data centric applications difficult across large user communities and highly distributed infrastructures. This poster proposes a solution for accessing and sharing digital objects over a networked environment using Digital object interface protocol (DOIP) and Named Data Networking (NDN).

公平性(可查找性、可访问性、互操作性和可重用性)对于实现基于大型提供者和用户社区的数字对象的开放式科学和创新至关重要。然而，版本控制、识别和分布式访问系统之间的差距常常使以数据为中心的应用程序难以跨大型用户社区和高度分布式的基础设施进行可伸缩性。这张海报提出了一个使用数字对象接口协议(DOIP)和命名数据网络(NDN)在网络环境中访问和共享数字对象的解决方案。

引用次数: 2

Learning Everywhere: A Taxonomy for the Integration of Machine Learning and Simulations 无处不在的学习:机器学习和模拟集成的分类

2019 15th International Conference on eScience (eScience)

Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00057

G. Fox, S. Jha

We present a taxonomy of research on Machine Learning (ML) applied to enhance simulations together with a catalog of some activities. We cover eight patterns for the link of ML to the simulations or systems plus three algorithmic areas: particle dynamics, agent-based models and partial differential equations. The patterns are further divided into three action areas: Improving simulation with Configurations and Integration of Data, Learn Structure, Theory and Model for Simulation, and Learn to make Surrogates.

我们提出了一种用于增强模拟的机器学习(ML)研究分类以及一些活动的目录。我们涵盖了机器学习与模拟或系统的链接的八种模式以及三个算法领域:粒子动力学，基于智能体的模型和偏微分方程。这些模式进一步分为三个行动领域:通过配置和数据集成改进仿真、学习结构、仿真理论和模型以及学习制作代理。

引用次数: 7

OKG-Soft: An Open Knowledge Graph with Machine Readable Scientific Software Metadata OKG-Soft:一个具有机器可读科学软件元数据的开放知识图谱

2019 15th International Conference on eScience (eScience)

Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00046

D. Garijo, Maximiliano Osorio, D. Khider, V. Ratnakar, Y. Gil

Scientific software is crucial for understanding, reusing and reproducing results in computational sciences. Software is often stored in code repositories, which may contain human readable instructions necessary to use it and set it up. However, a significant amount of time is usually required to understand how to invoke a software component, prepare data in the format it requires, and use it in combination with other software. In this paper we introduce OKG-Soft, an open knowledge graph that describes scientific software in a machine readable manner. OKG-Soft includes: 1) an ontology designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework to annotate, query, explore and curate scientific software metadata. OKG-Soft supports the FAIR principles of findability, accessibility, interoperability, and reuse for software. We demonstrate the benefits of OKG-Soft with two applications: a browser for understanding scientific models in the environmental and social sciences, and a portal to combine climate, hydrology, agriculture, and economic software models.

在计算科学中，科学软件对于理解、重用和再现结果至关重要。软件通常存储在代码库中，其中可能包含使用和设置软件所需的人类可读指令。然而，通常需要大量的时间来理解如何调用软件组件，以所需的格式准备数据，并将其与其他软件结合使用。本文介绍了一种以机器可读方式描述科学软件的开放式知识图谱OKG-Soft。OKG-Soft包括:1)用于描述软件及其使用的特定数据格式的本体;2)一种将软件元数据作为开放知识图发布的方法，并与其他Web of Data对象相链接;3)一个对科学软件元数据进行标注、查询、挖掘和管理的框架。OKG-Soft支持软件的可查找性、可访问性、互操作性和重用性的FAIR原则。我们通过两个应用程序展示了OKG-Soft的好处:一个用于理解环境和社会科学科学模型的浏览器，以及一个结合气候、水文、农业和经济软件模型的门户。

{"title":"OKG-Soft: An Open Knowledge Graph with Machine Readable Scientific Software Metadata","authors":"D. Garijo, Maximiliano Osorio, D. Khider, V. Ratnakar, Y. Gil","doi":"10.1109/eScience.2019.00046","DOIUrl":"https://doi.org/10.1109/eScience.2019.00046","url":null,"abstract":"Scientific software is crucial for understanding, reusing and reproducing results in computational sciences. Software is often stored in code repositories, which may contain human readable instructions necessary to use it and set it up. However, a significant amount of time is usually required to understand how to invoke a software component, prepare data in the format it requires, and use it in combination with other software. In this paper we introduce OKG-Soft, an open knowledge graph that describes scientific software in a machine readable manner. OKG-Soft includes: 1) an ontology designed to describe software and the specific data formats it uses; 2) an approach to publish software metadata as an open knowledge graph, linked to other Web of Data objects; and 3) a framework to annotate, query, explore and curate scientific software metadata. OKG-Soft supports the FAIR principles of findability, accessibility, interoperability, and reuse for software. We demonstrate the benefits of OKG-Soft with two applications: a browser for understanding scientific models in the environmental and social sciences, and a portal to combine climate, hydrology, agriculture, and economic software models.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121922555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Making Data FAIR Requires More than Just Principles: We Need Knowledge Technologies 让数据公平不仅仅需要原则:我们需要知识技术

2019 15th International Conference on eScience (eScience)

Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00071

M. Musen

Discussions regarding open science have circulated in the scientific community for many years. The articulation of the FAIR principles in 2016, however, led to a groundswell of excitement to make experimental data findable, accessible, interoperable, and reusable. The FAIR acronym is catchy and easy to remember. The 15 FAIR principles, however, are not. Efforts to enhance access to scientific datasets and to promote their reuse require intuitive tools that implement the FAIR principles as a side effect of their use. The CEDAR Workbench is one such tool that simplifies the authoring of standardized, comprehensive metadata to make datasets FAIR. Systems such as the CEDAR Workbench, which renders datasets FAIR in a transparent fashion, can enhance open science as a direct byproduct of their use. Current projects that have adopted the CEDAR Workbench provide an opportunity to assess how well knowledge technologies can facilitate the creation of FAIR data.

关于开放科学的讨论已经在科学界流传多年。然而，2016年FAIR原则的阐明引发了一股让实验数据可查找、可访问、可互操作和可重用的热潮。FAIR这个首字母缩略词朗朗上口，容易记住。然而，15项公平原则并非如此。加强对科学数据集的访问和促进其重用的努力需要直观的工具来实现公平原则，作为其使用的副作用。CEDAR Workbench就是这样一个工具，它简化了标准化、全面元数据的编写，使数据集公平。像CEDAR Workbench这样的系统以透明的方式呈现公平的数据集，可以作为其使用的直接副产品来加强开放科学。采用了CEDAR Workbench的当前项目提供了一个机会来评估知识技术在多大程度上促进了FAIR数据的创建。

引用次数: 0

Scalable Performance Awareness for In Situ Scientific Applications 为现场科学应用提供可扩展的性能感知

2019 15th International Conference on eScience (eScience)

Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00037

M. Wolf, J. Dominski, G. Merlo, J. Choi, G. Eisenhauer, S. Ethier, K. Huck, S. Klasky, Jeremy S. Logan, A. Malony, Chad Wood

Part of the promise of exascale computing and the next generation of scientific simulation codes is the ability to bring together time and spatial scales that have traditionally been treated separately. This enables creating complex coupled simulations and in situ analysis pipelines, encompassing such things as "whole device" fusion models or the simulation of cities from sewers to rooftops. Unfortunately, the HPC analysis tools that have been built up over the preceding decades are ill suited to the debugging and performance analysis of such computational ensembles. In this paper, we present a new vision for performance measurement and understanding of HPC codes, MonitoringAnalytics (MONA). MONA is designed to be a flexible, high performance monitoring infrastructure that can perform monitoring analysis in place or in transit by embedding analytics and characterization directly into the data stream, without relying upon delivering all monitoring information to a central database for post-processing. It addresses the trade-offs between the prohibitively expensive capture of all performance characteristics and not capturing enough to detect the features of interest. We demonstrate several uses of MONA; capturing and indexing multi-executable performance profiles to enable later processing, extraction of performance primitives to enable the generation of customizable benchmarks and performance skeletons, and extracting communication and application behaviors to enable better control and placement for the current and future runs of the science ensemble. Relevant performance information based on a system for MONA built from ADIOS and SOSflow technologies is provided for DOE science applications and leadership machines.

超大规模计算和下一代科学模拟代码的部分优势在于能够将传统上分开处理的时间和空间尺度结合起来。这样就能创建复杂的耦合模拟和现场分析管道，包括 "整个设备 "融合模型或城市从下水道到屋顶的模拟等。遗憾的是，过去几十年来建立起来的高性能计算分析工具并不适合此类计算集合的调试和性能分析。在本文中，我们提出了一种用于高性能计算代码性能测量和理解的新视角--MonitoringAnalytics（MONA）。MONA 是一种灵活的高性能监控基础架构，可通过将分析和特征描述直接嵌入数据流，就地或在传输过程中执行监控分析，而无需依赖将所有监控信息传送到中央数据库进行后处理。它解决了捕获所有性能特征的成本过高和捕获量不足以检测所关注特征之间的权衡问题。我们展示了 MONA 的几种用途：捕获多执行性能剖面并编制索引，以便进行后期处理；提取性能基元，以便生成可定制的基准和性能骨架；提取通信和应用行为，以便更好地控制和安排科学组合的当前和未来运行。基于ADIOS和SOSflow技术构建的MONA系统为能源部科学应用和领导机器提供了相关性能信息。

{"title":"Scalable Performance Awareness for In Situ Scientific Applications","authors":"M. Wolf, J. Dominski, G. Merlo, J. Choi, G. Eisenhauer, S. Ethier, K. Huck, S. Klasky, Jeremy S. Logan, A. Malony, Chad Wood","doi":"10.1109/eScience.2019.00037","DOIUrl":"https://doi.org/10.1109/eScience.2019.00037","url":null,"abstract":"Part of the promise of exascale computing and the next generation of scientific simulation codes is the ability to bring together time and spatial scales that have traditionally been treated separately. This enables creating complex coupled simulations and in situ analysis pipelines, encompassing such things as \"whole device\" fusion models or the simulation of cities from sewers to rooftops. Unfortunately, the HPC analysis tools that have been built up over the preceding decades are ill suited to the debugging and performance analysis of such computational ensembles. In this paper, we present a new vision for performance measurement and understanding of HPC codes, MonitoringAnalytics (MONA). MONA is designed to be a flexible, high performance monitoring infrastructure that can perform monitoring analysis in place or in transit by embedding analytics and characterization directly into the data stream, without relying upon delivering all monitoring information to a central database for post-processing. It addresses the trade-offs between the prohibitively expensive capture of all performance characteristics and not capturing enough to detect the features of interest. We demonstrate several uses of MONA; capturing and indexing multi-executable performance profiles to enable later processing, extraction of performance primitives to enable the generation of customizable benchmarks and performance skeletons, and extracting communication and application behaviors to enable better control and placement for the current and future runs of the science ensemble. Relevant performance information based on a system for MONA built from ADIOS and SOSflow technologies is provided for DOE science applications and leadership machines.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129915829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2019 15th International Conference on eScience (eScience)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀