首页 > 最新文献

2018 IEEE 14th International Conference on e-Science (e-Science)最新文献

英文 中文
Evaluating Layer-Wise Relevance Propagation Explainability Maps for Artificial Neural Networks 评估人工神经网络的分层相关传播可解释性图
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00107
E. Ranguelova, E. Pauwels, J. Berkhout
Layer-wise relevance propagation (LRP) heatmaps aim to provide graphical explanation for decisions of a classifier. This could be of great benefit to scientists for trusting complex black-box models and getting insights from their data. The LRP heatmaps tested on benchmark datasets are reported to correlate significantly with interpretable image features. In this work, we investigate these claims and propose to refine them.
分层相关传播(LRP)热图旨在为分类器的决策提供图形化解释。这可能对科学家们有很大的好处,因为他们相信复杂的黑盒模型,并从他们的数据中获得见解。据报道,在基准数据集上测试的LRP热图与可解释的图像特征显著相关。在这项工作中,我们调查了这些说法,并提出了完善它们的建议。
{"title":"Evaluating Layer-Wise Relevance Propagation Explainability Maps for Artificial Neural Networks","authors":"E. Ranguelova, E. Pauwels, J. Berkhout","doi":"10.1109/eScience.2018.00107","DOIUrl":"https://doi.org/10.1109/eScience.2018.00107","url":null,"abstract":"Layer-wise relevance propagation (LRP) heatmaps aim to provide graphical explanation for decisions of a classifier. This could be of great benefit to scientists for trusting complex black-box models and getting insights from their data. The LRP heatmaps tested on benchmark datasets are reported to correlate significantly with interpretable image features. In this work, we investigate these claims and propose to refine them.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"26 1","pages":"377-378"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78233940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Open Knowledge Discovery and Data Mining from Patient Forums 来自患者论坛的开放知识发现和数据挖掘
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00119
A. Dirkson, S. Verberne, G. Oortmerssen, H. Gelderblom, Wessel Kraaij
n/a
{"title":"Open Knowledge Discovery and Data Mining from Patient Forums","authors":"A. Dirkson, S. Verberne, G. Oortmerssen, H. Gelderblom, Wessel Kraaij","doi":"10.1109/eScience.2018.00119","DOIUrl":"https://doi.org/10.1109/eScience.2018.00119","url":null,"abstract":"n/a","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"31 1","pages":"397-398"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78607378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Survey on Research Software Engineering in the Netherlands 荷兰研究软件工程概况
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00017
B. V. Werkhoven, T. Bakker, Olivier Philippe, S. Hettrick
This paper presents a brief overview of the Research Software Engineering landscape in the Netherlands and includes a summary of the results from a survey held in December 2017 in the Netherlands and several other countries. The results show that best practices are widely adopted. Research software is produced by small teams or individuals, is often used for scientific publications, and is frequently acknowledged in publications.
本文简要概述了荷兰的研究软件工程领域,并总结了2017年12月在荷兰和其他几个国家进行的调查结果。结果表明,最佳实践被广泛采用。研究软件是由小型团队或个人制作的,经常用于科学出版物,并且经常在出版物中得到认可。
{"title":"Survey on Research Software Engineering in the Netherlands","authors":"B. V. Werkhoven, T. Bakker, Olivier Philippe, S. Hettrick","doi":"10.1109/eScience.2018.00017","DOIUrl":"https://doi.org/10.1109/eScience.2018.00017","url":null,"abstract":"This paper presents a brief overview of the Research Software Engineering landscape in the Netherlands and includes a summary of the results from a survey held in December 2017 in the Netherlands and several other countries. The results show that best practices are widely adopted. Research software is produced by small teams or individuals, is often used for scientific publications, and is frequently acknowledged in publications.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"34 1","pages":"38-39"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72918293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modelling Implicit Content Networks to Track Information Propagation Across Media Sources to Analyze News Events 建模隐式内容网络跟踪信息传播跨媒体来源分析新闻事件
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00136
Anirudh Joshi, R. Sinnott
With the rise of the Internet as the premier news source for billions of people around the world, the propagation of news media online now influences many critical decisions made by society every day. Fake news is now a mainstream concern. In the context of news propagation, recent works in media analysis largely focus on extracting clusters, news events, stories or tracking links or conserved sentences at aggregate levels between sources. However, the insight provided by these approaches is limited for analysis and context for end users. To tackle this, we present an approach to model implicit content networks at a semantic level that is inherent within news event clusters as seen by users on a daily basis through the generation of semantic content indexes. The approach is based on an end-to-end unsupervised machine learning system trained on real-life news data that combine together with algorithms to generate useful contextual views of the sources and the inter-relationships of news events. We illustrate how the approach is able to track conserved semantic context through the use of a combination of machine learning techniques, including document vectors, k-nearest neighbors and the use of hierarchical agglomerative clustering. We demonstrate the system by training semantic vector models on realistic real-world data taken from the Signal News dataset. We quantitatively evaluate the performance against existing state of the art systems to demonstrate the end-to-end capability. We then qualitatively demonstrate the usefulness of a news event centered semantic content index graph for end-user applications. This is evaluated with respect to the goal of generating rich contextual interconnections and providing differential background on how news media sources report, parrot and position information on ostensibly identical news events.
随着互联网作为全球数十亿人的首要新闻来源的兴起,在线新闻媒体的传播现在每天都影响着社会做出的许多关键决策。假新闻现在是一个主流问题。在新闻传播的背景下,最近的媒体分析工作主要集中在提取聚类、新闻事件、故事或跟踪来源之间聚合级别的链接或保守句子。然而,这些方法提供的洞察力对于最终用户的分析和上下文是有限的。为了解决这个问题,我们提出了一种方法,通过生成语义内容索引,在用户每天看到的新闻事件集群中固有的语义级别对隐含内容网络进行建模。该方法基于端到端的无监督机器学习系统,该系统接受了真实新闻数据的训练,该系统与算法相结合,生成有用的新闻事件来源和相互关系的上下文视图。我们说明了该方法如何通过使用机器学习技术的组合来跟踪保守的语义上下文,包括文档向量、k近邻和分层凝聚聚类的使用。我们通过训练来自Signal News数据集的真实世界数据的语义向量模型来演示该系统。我们根据现有技术系统的状态对性能进行定量评估,以演示端到端能力。然后,我们定性地演示了以新闻事件为中心的语义内容索引图对最终用户应用程序的有用性。这是根据产生丰富的上下文互连的目标进行评估的,并就新闻媒体来源如何报道、模仿和定位表面上相同的新闻事件提供不同的背景。
{"title":"Modelling Implicit Content Networks to Track Information Propagation Across Media Sources to Analyze News Events","authors":"Anirudh Joshi, R. Sinnott","doi":"10.1109/eScience.2018.00136","DOIUrl":"https://doi.org/10.1109/eScience.2018.00136","url":null,"abstract":"With the rise of the Internet as the premier news source for billions of people around the world, the propagation of news media online now influences many critical decisions made by society every day. Fake news is now a mainstream concern. In the context of news propagation, recent works in media analysis largely focus on extracting clusters, news events, stories or tracking links or conserved sentences at aggregate levels between sources. However, the insight provided by these approaches is limited for analysis and context for end users. To tackle this, we present an approach to model implicit content networks at a semantic level that is inherent within news event clusters as seen by users on a daily basis through the generation of semantic content indexes. The approach is based on an end-to-end unsupervised machine learning system trained on real-life news data that combine together with algorithms to generate useful contextual views of the sources and the inter-relationships of news events. We illustrate how the approach is able to track conserved semantic context through the use of a combination of machine learning techniques, including document vectors, k-nearest neighbors and the use of hierarchical agglomerative clustering. We demonstrate the system by training semantic vector models on realistic real-world data taken from the Signal News dataset. We quantitatively evaluate the performance against existing state of the art systems to demonstrate the end-to-end capability. We then qualitatively demonstrate the usefulness of a news event centered semantic content index graph for end-user applications. This is evaluated with respect to the goal of generating rich contextual interconnections and providing differential background on how news media sources report, parrot and position information on ostensibly identical news events.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"27 1","pages":"475-485"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77177912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Role of Data Stewardship in Software Sustainability and Reproducibility 数据管理在软件可持续性和再现性中的作用
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00009
Maria J. Cruz, Shalini Kurapati, Yasemin Turkyilmaz-van der Velden
Software and computational tools are instrumental for scientific investigation in today's digitized research environment. Despite this crucial role, the path towards implementing best practices to achieve reproducibility and sustainability of research software is challenging. Delft University of Technology has begun recently a novel initiative of data stewardship - disciplinary support for research data management, one of the main aims of which is achieving reproducibility of scientific results in general. In this paper, we aim to explore the potential of data stewardship for supporting software reproducibility and sustainability as well. Recently, we gathered the key stakeholders of the topic (i.e. researchers, research software engineers, and data stewards) in a workshop setting to understand the challenges and barriers, the support required to achieve software sustainability and reproducibility, and how all the three parties can efficiently work together. Based on the insights from the workshop, as well as our professional experience as data stewards, we draw conclusions on possible ways forward to achieve the important goal of software reproducibility and sustainability through coordinated efforts of the key stakeholders.
在当今数字化的研究环境中,软件和计算工具是科学研究的工具。尽管这个关键的角色,实现最佳实践的道路,以实现研究软件的可重复性和可持续性是具有挑战性的。代尔夫特理工大学(Delft University of Technology)最近开始了一项数据管理的新举措——为研究数据管理提供学科支持,其主要目标之一是实现科学结果的可重复性。在本文中,我们的目标是探索数据管理的潜力,以支持软件的再现性和可持续性。最近,我们聚集了该主题的关键利益相关者(即研究人员、研究软件工程师和数据管理员),在一个研讨会环境中了解挑战和障碍,实现软件可持续性和可再现性所需的支持,以及所有三方如何有效地协同工作。根据研讨会的见解,以及我们作为数据管理员的专业经验,我们得出结论,通过关键利益相关者的协调努力,我们可以通过可能的方式实现软件可再现性和可持续性的重要目标。
{"title":"The Role of Data Stewardship in Software Sustainability and Reproducibility","authors":"Maria J. Cruz, Shalini Kurapati, Yasemin Turkyilmaz-van der Velden","doi":"10.1109/eScience.2018.00009","DOIUrl":"https://doi.org/10.1109/eScience.2018.00009","url":null,"abstract":"Software and computational tools are instrumental for scientific investigation in today's digitized research environment. Despite this crucial role, the path towards implementing best practices to achieve reproducibility and sustainability of research software is challenging. Delft University of Technology has begun recently a novel initiative of data stewardship - disciplinary support for research data management, one of the main aims of which is achieving reproducibility of scientific results in general. In this paper, we aim to explore the potential of data stewardship for supporting software reproducibility and sustainability as well. Recently, we gathered the key stakeholders of the topic (i.e. researchers, research software engineers, and data stewards) in a workshop setting to understand the challenges and barriers, the support required to achieve software sustainability and reproducibility, and how all the three parties can efficiently work together. Based on the insights from the workshop, as well as our professional experience as data stewards, we draw conclusions on possible ways forward to achieve the important goal of software reproducibility and sustainability through coordinated efforts of the key stakeholders.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"6 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79714526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
TI-One: Active Research Data Management in a Modern Philosophy Department i - 1:现代哲学系的主动研究数据管理
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00070
Gioele Barabucci, Mark Eschweiler, Andreas Speer
When it comes to managing their digital data, researchers are often left to their own devices, with little guidance from their hosting institution. These problems are exacerbated in the humanities, in which each project is seen as a separate world that needs special solutions, leading to data losses and an accumulation of technical debt. This paper presents our vision and progress on TI-One: a department-wide system that guides the management of the data of the whole Thomas-Institut, part of the Philosophy Faculty of the University of Cologne. The novel features of TI-One are 1) a department-wide set of guidelines and conventions, 2) the materialization of live data from non-file sources (e.g., DBs), 3) a versioning system with extended metadata that creates an almost effortless path from automated backups to proper long-term archival of research data.
当涉及到管理他们的数字数据时,研究人员往往只能依靠自己的设备,很少得到托管机构的指导。这些问题在人文学科中更加严重,每个项目都被视为一个独立的世界,需要特殊的解决方案,导致数据丢失和技术债务的积累。本文介绍了我们对TI-One的愿景和进展:TI-One是一个全系系统,指导整个托马斯研究所(科隆大学哲学系的一部分)的数据管理。TI-One的新特点是1)一套部门范围的指导方针和惯例,2)从非文件来源(例如,数据库)实现实时数据,3)具有扩展元数据的版本控制系统,可以创建从自动备份到适当的长期研究数据存档的几乎毫不费力的路径。
{"title":"TI-One: Active Research Data Management in a Modern Philosophy Department","authors":"Gioele Barabucci, Mark Eschweiler, Andreas Speer","doi":"10.1109/eScience.2018.00070","DOIUrl":"https://doi.org/10.1109/eScience.2018.00070","url":null,"abstract":"When it comes to managing their digital data, researchers are often left to their own devices, with little guidance from their hosting institution. These problems are exacerbated in the humanities, in which each project is seen as a separate world that needs special solutions, leading to data losses and an accumulation of technical debt. This paper presents our vision and progress on TI-One: a department-wide system that guides the management of the data of the whole Thomas-Institut, part of the Philosophy Faculty of the University of Cologne. The novel features of TI-One are 1) a department-wide set of guidelines and conventions, 2) the materialization of live data from non-file sources (e.g., DBs), 3) a versioning system with extended metadata that creates an almost effortless path from automated backups to proper long-term archival of research data.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"23 1","pages":"314-315"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81707348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Simulating HEP Workflows on Heterogeneous Architectures 异构架构上的HEP工作流模拟
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00087
C. Leggett, I. Shapoval
The next generation of supercomputing facilities, such as Oak Ridge's Summit and Lawrence Livermore's Sierra, show an increasing use of GPGPUs and other accelerators in order to achieve their high FLOP counts. This trend will only grow with exascale facilities. In general, High Energy Physics computing workflows have made little use of GPUs due to the relatively small fraction of kernels that run efficiently on GPUs, and the expense of rewriting code for rapidly evolving GPU hardware. However, the computing requirements for high-luminosity LHC are enormous, and it will become essential to be able to make use of supercomputing facilities that rely heavily on GPUs and other accelerator technologies. ATLAS has already developed an extension to AthenaMT, its multithreaded event processing framework, that enables the non-intrusive offloading of computations to external accelerator resources, and is developing strategies to schedule the offloading efficiently. Before investing heavily in writing many kernels, we need to better understand the performance metrics and throughput bounds of the workflows with various accelerator configurations. This can be done by simulating the workflows, using real metrics for task interdependencies and timing, as we vary fractions of offloaded tasks, latencies, data conversion speeds, memory bandwidths, and accelerator offloading parameters such as CPU/GPU ratios and speeds. We present the results of these studies, which will be instrumental in directing effort to make the ATLAS framework, kernels and workflows run efficiently on exascale facilities.
下一代超级计算设备,如橡树岭的Summit和Lawrence Livermore的Sierra,显示出越来越多地使用gpgpu和其他加速器来实现高FLOP计数。这种趋势只会随着百亿亿次设施的发展而增长。一般来说,高能物理计算工作流很少使用GPU,因为在GPU上有效运行的内核相对较少,并且为快速发展的GPU硬件重写代码的费用很高。然而,高亮度LHC的计算需求是巨大的,能够利用严重依赖gpu和其他加速器技术的超级计算设施将变得至关重要。ATLAS已经开发了AthenaMT的扩展,AthenaMT是其多线程事件处理框架,可以将非侵入性的计算卸载到外部加速器资源,并且正在开发有效调度卸载的策略。在大量投入编写许多内核之前,我们需要更好地理解使用各种加速器配置的工作流的性能指标和吞吐量界限。这可以通过模拟工作流来实现,使用任务相互依赖性和时间的真实指标,因为我们可以改变卸载任务的部分、延迟、数据转换速度、内存带宽和加速器卸载参数(如CPU/GPU比率和速度)。我们介绍了这些研究的结果,这将有助于指导使ATLAS框架,内核和工作流程在百亿亿次设施上有效运行的工作。
{"title":"Simulating HEP Workflows on Heterogeneous Architectures","authors":"C. Leggett, I. Shapoval","doi":"10.1109/eScience.2018.00087","DOIUrl":"https://doi.org/10.1109/eScience.2018.00087","url":null,"abstract":"The next generation of supercomputing facilities, such as Oak Ridge's Summit and Lawrence Livermore's Sierra, show an increasing use of GPGPUs and other accelerators in order to achieve their high FLOP counts. This trend will only grow with exascale facilities. In general, High Energy Physics computing workflows have made little use of GPUs due to the relatively small fraction of kernels that run efficiently on GPUs, and the expense of rewriting code for rapidly evolving GPU hardware. However, the computing requirements for high-luminosity LHC are enormous, and it will become essential to be able to make use of supercomputing facilities that rely heavily on GPUs and other accelerator technologies. ATLAS has already developed an extension to AthenaMT, its multithreaded event processing framework, that enables the non-intrusive offloading of computations to external accelerator resources, and is developing strategies to schedule the offloading efficiently. Before investing heavily in writing many kernels, we need to better understand the performance metrics and throughput bounds of the workflows with various accelerator configurations. This can be done by simulating the workflows, using real metrics for task interdependencies and timing, as we vary fractions of offloaded tasks, latencies, data conversion speeds, memory bandwidths, and accelerator offloading parameters such as CPU/GPU ratios and speeds. We present the results of these studies, which will be instrumental in directing effort to make the ATLAS framework, kernels and workflows run efficiently on exascale facilities.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"6 1","pages":"343-343"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88840048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards Exascale Computing for High Energy Physics: The ATLAS Experience at ORNL 迈向高能物理的百亿亿次计算:在ORNL的ATLAS经验
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00086
V. Ananthraj, K. De, S. Jha, A. Klimentov, D. Oleynik, S. Oral, André Merzky, R. Mashinistov, S. Panitkin, P. Svirin, M. Turilli, J. Wells, Sean R. Wilkinson
Traditionally, the ATLAS experiment at Large Hadron Collider (LHC) has utilized distributed resources as provided by the Worldwide LHC Computing Grid (WLCG) to support data distribution, data analysis and simulations. For example, the ATLAS experiment uses a geographically distributed grid of approximately 200,000 cores continuously (250 000 cores at peak), (over 1,000 million core-hours per year) to process, simulate, and analyze its data (todays total data volume of ATLAS is more than 300 PB). After the early success in discovering a new particle consistent with the long-awaited Higgs boson, ATLAS is continuing the precision measurements necessary for further discoveries. Planned high-luminosity LHC upgrade and related ATLAS detector upgrades, that are necessary for physics searches beyond Standard Model, pose serious challenge for ATLAS computing. Data volumes are expected to increase at higher energy and luminosity, causing the storage and computing needs to grow at a much higher pace than the flat budget technology evolution (see Fig. 1). The need for simulation and analysis will overwhelm the expected capacity of WLCG computing facilities unless the range and precision of physics studies will be curtailed.
传统上,大型强子对撞机(LHC)的ATLAS实验利用世界大型强子对撞机计算网格(WLCG)提供的分布式资源支持数据分发、数据分析和模拟。例如,ATLAS实验使用连续约20万核(峰值为25万核)的地理分布式网格(每年超过10亿核小时)来处理、模拟和分析其数据(目前ATLAS的总数据量超过300 PB)。在早期成功地发现了与期待已久的希格斯玻色子一致的新粒子之后,ATLAS正在继续进行进一步发现所需的精确测量。计划中的高亮度LHC升级和相关的ATLAS探测器升级是标准模型之外的物理搜索所必需的,这对ATLAS计算构成了严峻的挑战。数据量预计将在更高的能量和亮度下增加,导致存储和计算需求以比扁平预算技术发展更快的速度增长(见图1)。除非物理研究的范围和精度受到限制,否则对模拟和分析的需求将超过WLCG计算设施的预期容量。
{"title":"Towards Exascale Computing for High Energy Physics: The ATLAS Experience at ORNL","authors":"V. Ananthraj, K. De, S. Jha, A. Klimentov, D. Oleynik, S. Oral, André Merzky, R. Mashinistov, S. Panitkin, P. Svirin, M. Turilli, J. Wells, Sean R. Wilkinson","doi":"10.1109/eScience.2018.00086","DOIUrl":"https://doi.org/10.1109/eScience.2018.00086","url":null,"abstract":"Traditionally, the ATLAS experiment at Large Hadron Collider (LHC) has utilized distributed resources as provided by the Worldwide LHC Computing Grid (WLCG) to support data distribution, data analysis and simulations. For example, the ATLAS experiment uses a geographically distributed grid of approximately 200,000 cores continuously (250 000 cores at peak), (over 1,000 million core-hours per year) to process, simulate, and analyze its data (todays total data volume of ATLAS is more than 300 PB). After the early success in discovering a new particle consistent with the long-awaited Higgs boson, ATLAS is continuing the precision measurements necessary for further discoveries. Planned high-luminosity LHC upgrade and related ATLAS detector upgrades, that are necessary for physics searches beyond Standard Model, pose serious challenge for ATLAS computing. Data volumes are expected to increase at higher energy and luminosity, causing the storage and computing needs to grow at a much higher pace than the flat budget technology evolution (see Fig. 1). The need for simulation and analysis will overwhelm the expected capacity of WLCG computing facilities unless the range and precision of physics studies will be curtailed.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"27 1","pages":"341-342"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83730700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Semantic Software Metadata for Workflow Exploration and Evolution 工作流探索与演化的语义软件元数据
Pub Date : 2018-10-01 DOI: 10.1109/ESCIENCE.2018.00132
L. Carvalho, D. Garijo, C. B. Medeiros, Y. Gil
Scientific workflow management systems play a major role in the design, execution and documentation of computational experiments. However, they have limited support for managing workflow evolution and exploration because they lack rich metadata for the software that implements workflow components. Such metadata could be used to support scientists in exploring local adjustments to a workflow, replacing components with similar software, or upgrading components upon release of newer software versions. To address this challenge, we propose OntoSoft-VFF (Ontology for Software Version, Function and Functionality), a software metadata repository designed to capture information about software and workflow components that is important for managing workflow exploration and evolution. Our approach uses a novel ontology to describe the functionality and evolution through time of any software used to create workflow components. OntoSoft-VFF is implemented as an online catalog that stores semantic metadata for software to enable workflow exploration through understanding of software functionality and evolution. The catalog also supports comparison and semantic search of software metadata. We showcase OntoSoft-VFF using machine learning workflow examples. We validate our approach by testing that a workflow system could compare differences in software metadata, explain software updates and describe the general functionality of workflow steps.
科学的工作流程管理系统在计算实验的设计、执行和记录中起着重要的作用。然而,它们对管理工作流演变和探索的支持有限,因为它们缺乏实现工作流组件的软件的丰富元数据。此类元数据可用于支持科学家探索对工作流程的局部调整,用类似的软件替换组件,或在发布新软件版本时升级组件。为了应对这一挑战,我们提出了OntoSoft-VFF(软件版本、功能和功能本体),这是一个软件元数据存储库,旨在捕获有关软件和工作流组件的信息,这些信息对于管理工作流的探索和发展非常重要。我们的方法使用一种新颖的本体来描述用于创建工作流组件的任何软件的功能和随时间的演变。OntoSoft-VFF是作为在线目录实现的,它存储了软件的语义元数据,通过理解软件的功能和演变来实现工作流探索。该目录还支持软件元数据的比较和语义搜索。我们使用机器学习工作流示例展示OntoSoft-VFF。我们通过测试工作流系统可以比较软件元数据的差异、解释软件更新和描述工作流步骤的一般功能来验证我们的方法。
{"title":"Semantic Software Metadata for Workflow Exploration and Evolution","authors":"L. Carvalho, D. Garijo, C. B. Medeiros, Y. Gil","doi":"10.1109/ESCIENCE.2018.00132","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2018.00132","url":null,"abstract":"Scientific workflow management systems play a major role in the design, execution and documentation of computational experiments. However, they have limited support for managing workflow evolution and exploration because they lack rich metadata for the software that implements workflow components. Such metadata could be used to support scientists in exploring local adjustments to a workflow, replacing components with similar software, or upgrading components upon release of newer software versions. To address this challenge, we propose OntoSoft-VFF (Ontology for Software Version, Function and Functionality), a software metadata repository designed to capture information about software and workflow components that is important for managing workflow exploration and evolution. Our approach uses a novel ontology to describe the functionality and evolution through time of any software used to create workflow components. OntoSoft-VFF is implemented as an online catalog that stores semantic metadata for software to enable workflow exploration through understanding of software functionality and evolution. The catalog also supports comparison and semantic search of software metadata. We showcase OntoSoft-VFF using machine learning workflow examples. We validate our approach by testing that a workflow system could compare differences in software metadata, explain software updates and describe the general functionality of workflow steps.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"11 1","pages":"431-441"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85248297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Catching Toad Calls in the Cloud: Commodity Edge Computing for Flexible Analysis of Big Sound Data 在云端捕捉蟾蜍的叫声:用于灵活分析大声音数据的商品边缘计算
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00022
P. Roe, Meriem Ferroudj, M. Towsey, L. Schwarzkopf
Passive acoustic recording has great potential for monitoring both endangered and pest species. However, the automatic analysis of natural sound recordings is challenging due to geographic variation in background sounds in habitats and species calls. We have designed and deployed an acoustic sensor network constituting an early warning system for a vocal invasive species, in particular cane toads. The challenging nature of recognising toad calls and the big data arising from sound recording gave rise to a novel edge computing system which permits both effective monitoring and flexible experimentation. This is achieved through a multi-stage analysis system in which calls are detected and progressively filtered, to both reduce data communication needs and to improve detection accuracy. The filtering occurs across different stages of the cloud system. This permits flexible experimentation, for example when a new call or false positive is received. Furthermore, to balance the loss of data from aggressive filtering (call recognition), novel overview techniques are employed to provide data summaries. In this way an end user can receive alerts that a toad call is present, the system can be tuned on the fly, and the user can view summary data to have confidence that the system is functioning correctly. The system has been deployed and is in day-to-day use. The novel approaches taken are applicable to other edge computing systems, which analyse large data streams looking for infrequent events and the system has application for monitoring other vocal species.
被动式声学记录在监测濒危物种和有害物种方面具有巨大的潜力。然而,由于栖息地和物种叫声背景声音的地理差异,自然录音的自动分析具有挑战性。我们设计并部署了一个声学传感器网络,为发声入侵物种,特别是甘蔗蟾蜍,构建了一个早期预警系统。识别蟾蜍叫声的挑战性和录音产生的大数据产生了一种新的边缘计算系统,它允许有效的监控和灵活的实验。这是通过一个多阶段分析系统来实现的,在这个系统中,呼叫被检测并逐步过滤,既减少了数据通信需求,又提高了检测精度。过滤发生在云系统的不同阶段。这允许灵活的实验,例如当收到一个新的呼叫或误报时。此外,为了平衡主动过滤(呼叫识别)带来的数据损失,采用了新颖的概述技术来提供数据摘要。通过这种方式,最终用户可以收到蟾蜍调用的警报,系统可以动态调优,用户可以查看汇总数据,以确信系统正在正确运行。该系统已部署并投入日常使用。采用的新方法适用于其他边缘计算系统,该系统分析大数据流以寻找不频繁的事件,并且该系统可用于监测其他声音物种。
{"title":"Catching Toad Calls in the Cloud: Commodity Edge Computing for Flexible Analysis of Big Sound Data","authors":"P. Roe, Meriem Ferroudj, M. Towsey, L. Schwarzkopf","doi":"10.1109/eScience.2018.00022","DOIUrl":"https://doi.org/10.1109/eScience.2018.00022","url":null,"abstract":"Passive acoustic recording has great potential for monitoring both endangered and pest species. However, the automatic analysis of natural sound recordings is challenging due to geographic variation in background sounds in habitats and species calls. We have designed and deployed an acoustic sensor network constituting an early warning system for a vocal invasive species, in particular cane toads. The challenging nature of recognising toad calls and the big data arising from sound recording gave rise to a novel edge computing system which permits both effective monitoring and flexible experimentation. This is achieved through a multi-stage analysis system in which calls are detected and progressively filtered, to both reduce data communication needs and to improve detection accuracy. The filtering occurs across different stages of the cloud system. This permits flexible experimentation, for example when a new call or false positive is received. Furthermore, to balance the loss of data from aggressive filtering (call recognition), novel overview techniques are employed to provide data summaries. In this way an end user can receive alerts that a toad call is present, the system can be tuned on the fly, and the user can view summary data to have confidence that the system is functioning correctly. The system has been deployed and is in day-to-day use. The novel approaches taken are applicable to other edge computing systems, which analyse large data streams looking for infrequent events and the system has application for monitoring other vocal species.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"58 1","pages":"67-74"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80770774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2018 IEEE 14th International Conference on e-Science (e-Science)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1