Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00107
E. Ranguelova, E. Pauwels, J. Berkhout
Layer-wise relevance propagation (LRP) heatmaps aim to provide graphical explanation for decisions of a classifier. This could be of great benefit to scientists for trusting complex black-box models and getting insights from their data. The LRP heatmaps tested on benchmark datasets are reported to correlate significantly with interpretable image features. In this work, we investigate these claims and propose to refine them.
{"title":"Evaluating Layer-Wise Relevance Propagation Explainability Maps for Artificial Neural Networks","authors":"E. Ranguelova, E. Pauwels, J. Berkhout","doi":"10.1109/eScience.2018.00107","DOIUrl":"https://doi.org/10.1109/eScience.2018.00107","url":null,"abstract":"Layer-wise relevance propagation (LRP) heatmaps aim to provide graphical explanation for decisions of a classifier. This could be of great benefit to scientists for trusting complex black-box models and getting insights from their data. The LRP heatmaps tested on benchmark datasets are reported to correlate significantly with interpretable image features. In this work, we investigate these claims and propose to refine them.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"26 1","pages":"377-378"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78233940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00119
A. Dirkson, S. Verberne, G. Oortmerssen, H. Gelderblom, Wessel Kraaij
n/a
{"title":"Open Knowledge Discovery and Data Mining from Patient Forums","authors":"A. Dirkson, S. Verberne, G. Oortmerssen, H. Gelderblom, Wessel Kraaij","doi":"10.1109/eScience.2018.00119","DOIUrl":"https://doi.org/10.1109/eScience.2018.00119","url":null,"abstract":"n/a","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"31 1","pages":"397-398"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78607378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00017
B. V. Werkhoven, T. Bakker, Olivier Philippe, S. Hettrick
This paper presents a brief overview of the Research Software Engineering landscape in the Netherlands and includes a summary of the results from a survey held in December 2017 in the Netherlands and several other countries. The results show that best practices are widely adopted. Research software is produced by small teams or individuals, is often used for scientific publications, and is frequently acknowledged in publications.
{"title":"Survey on Research Software Engineering in the Netherlands","authors":"B. V. Werkhoven, T. Bakker, Olivier Philippe, S. Hettrick","doi":"10.1109/eScience.2018.00017","DOIUrl":"https://doi.org/10.1109/eScience.2018.00017","url":null,"abstract":"This paper presents a brief overview of the Research Software Engineering landscape in the Netherlands and includes a summary of the results from a survey held in December 2017 in the Netherlands and several other countries. The results show that best practices are widely adopted. Research software is produced by small teams or individuals, is often used for scientific publications, and is frequently acknowledged in publications.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"34 1","pages":"38-39"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72918293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00136
Anirudh Joshi, R. Sinnott
With the rise of the Internet as the premier news source for billions of people around the world, the propagation of news media online now influences many critical decisions made by society every day. Fake news is now a mainstream concern. In the context of news propagation, recent works in media analysis largely focus on extracting clusters, news events, stories or tracking links or conserved sentences at aggregate levels between sources. However, the insight provided by these approaches is limited for analysis and context for end users. To tackle this, we present an approach to model implicit content networks at a semantic level that is inherent within news event clusters as seen by users on a daily basis through the generation of semantic content indexes. The approach is based on an end-to-end unsupervised machine learning system trained on real-life news data that combine together with algorithms to generate useful contextual views of the sources and the inter-relationships of news events. We illustrate how the approach is able to track conserved semantic context through the use of a combination of machine learning techniques, including document vectors, k-nearest neighbors and the use of hierarchical agglomerative clustering. We demonstrate the system by training semantic vector models on realistic real-world data taken from the Signal News dataset. We quantitatively evaluate the performance against existing state of the art systems to demonstrate the end-to-end capability. We then qualitatively demonstrate the usefulness of a news event centered semantic content index graph for end-user applications. This is evaluated with respect to the goal of generating rich contextual interconnections and providing differential background on how news media sources report, parrot and position information on ostensibly identical news events.
{"title":"Modelling Implicit Content Networks to Track Information Propagation Across Media Sources to Analyze News Events","authors":"Anirudh Joshi, R. Sinnott","doi":"10.1109/eScience.2018.00136","DOIUrl":"https://doi.org/10.1109/eScience.2018.00136","url":null,"abstract":"With the rise of the Internet as the premier news source for billions of people around the world, the propagation of news media online now influences many critical decisions made by society every day. Fake news is now a mainstream concern. In the context of news propagation, recent works in media analysis largely focus on extracting clusters, news events, stories or tracking links or conserved sentences at aggregate levels between sources. However, the insight provided by these approaches is limited for analysis and context for end users. To tackle this, we present an approach to model implicit content networks at a semantic level that is inherent within news event clusters as seen by users on a daily basis through the generation of semantic content indexes. The approach is based on an end-to-end unsupervised machine learning system trained on real-life news data that combine together with algorithms to generate useful contextual views of the sources and the inter-relationships of news events. We illustrate how the approach is able to track conserved semantic context through the use of a combination of machine learning techniques, including document vectors, k-nearest neighbors and the use of hierarchical agglomerative clustering. We demonstrate the system by training semantic vector models on realistic real-world data taken from the Signal News dataset. We quantitatively evaluate the performance against existing state of the art systems to demonstrate the end-to-end capability. We then qualitatively demonstrate the usefulness of a news event centered semantic content index graph for end-user applications. This is evaluated with respect to the goal of generating rich contextual interconnections and providing differential background on how news media sources report, parrot and position information on ostensibly identical news events.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"27 1","pages":"475-485"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77177912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00009
Maria J. Cruz, Shalini Kurapati, Yasemin Turkyilmaz-van der Velden
Software and computational tools are instrumental for scientific investigation in today's digitized research environment. Despite this crucial role, the path towards implementing best practices to achieve reproducibility and sustainability of research software is challenging. Delft University of Technology has begun recently a novel initiative of data stewardship - disciplinary support for research data management, one of the main aims of which is achieving reproducibility of scientific results in general. In this paper, we aim to explore the potential of data stewardship for supporting software reproducibility and sustainability as well. Recently, we gathered the key stakeholders of the topic (i.e. researchers, research software engineers, and data stewards) in a workshop setting to understand the challenges and barriers, the support required to achieve software sustainability and reproducibility, and how all the three parties can efficiently work together. Based on the insights from the workshop, as well as our professional experience as data stewards, we draw conclusions on possible ways forward to achieve the important goal of software reproducibility and sustainability through coordinated efforts of the key stakeholders.
在当今数字化的研究环境中,软件和计算工具是科学研究的工具。尽管这个关键的角色,实现最佳实践的道路,以实现研究软件的可重复性和可持续性是具有挑战性的。代尔夫特理工大学(Delft University of Technology)最近开始了一项数据管理的新举措——为研究数据管理提供学科支持,其主要目标之一是实现科学结果的可重复性。在本文中,我们的目标是探索数据管理的潜力,以支持软件的再现性和可持续性。最近,我们聚集了该主题的关键利益相关者(即研究人员、研究软件工程师和数据管理员),在一个研讨会环境中了解挑战和障碍,实现软件可持续性和可再现性所需的支持,以及所有三方如何有效地协同工作。根据研讨会的见解,以及我们作为数据管理员的专业经验,我们得出结论,通过关键利益相关者的协调努力,我们可以通过可能的方式实现软件可再现性和可持续性的重要目标。
{"title":"The Role of Data Stewardship in Software Sustainability and Reproducibility","authors":"Maria J. Cruz, Shalini Kurapati, Yasemin Turkyilmaz-van der Velden","doi":"10.1109/eScience.2018.00009","DOIUrl":"https://doi.org/10.1109/eScience.2018.00009","url":null,"abstract":"Software and computational tools are instrumental for scientific investigation in today's digitized research environment. Despite this crucial role, the path towards implementing best practices to achieve reproducibility and sustainability of research software is challenging. Delft University of Technology has begun recently a novel initiative of data stewardship - disciplinary support for research data management, one of the main aims of which is achieving reproducibility of scientific results in general. In this paper, we aim to explore the potential of data stewardship for supporting software reproducibility and sustainability as well. Recently, we gathered the key stakeholders of the topic (i.e. researchers, research software engineers, and data stewards) in a workshop setting to understand the challenges and barriers, the support required to achieve software sustainability and reproducibility, and how all the three parties can efficiently work together. Based on the insights from the workshop, as well as our professional experience as data stewards, we draw conclusions on possible ways forward to achieve the important goal of software reproducibility and sustainability through coordinated efforts of the key stakeholders.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"6 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79714526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00070
Gioele Barabucci, Mark Eschweiler, Andreas Speer
When it comes to managing their digital data, researchers are often left to their own devices, with little guidance from their hosting institution. These problems are exacerbated in the humanities, in which each project is seen as a separate world that needs special solutions, leading to data losses and an accumulation of technical debt. This paper presents our vision and progress on TI-One: a department-wide system that guides the management of the data of the whole Thomas-Institut, part of the Philosophy Faculty of the University of Cologne. The novel features of TI-One are 1) a department-wide set of guidelines and conventions, 2) the materialization of live data from non-file sources (e.g., DBs), 3) a versioning system with extended metadata that creates an almost effortless path from automated backups to proper long-term archival of research data.
{"title":"TI-One: Active Research Data Management in a Modern Philosophy Department","authors":"Gioele Barabucci, Mark Eschweiler, Andreas Speer","doi":"10.1109/eScience.2018.00070","DOIUrl":"https://doi.org/10.1109/eScience.2018.00070","url":null,"abstract":"When it comes to managing their digital data, researchers are often left to their own devices, with little guidance from their hosting institution. These problems are exacerbated in the humanities, in which each project is seen as a separate world that needs special solutions, leading to data losses and an accumulation of technical debt. This paper presents our vision and progress on TI-One: a department-wide system that guides the management of the data of the whole Thomas-Institut, part of the Philosophy Faculty of the University of Cologne. The novel features of TI-One are 1) a department-wide set of guidelines and conventions, 2) the materialization of live data from non-file sources (e.g., DBs), 3) a versioning system with extended metadata that creates an almost effortless path from automated backups to proper long-term archival of research data.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"23 1","pages":"314-315"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81707348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00087
C. Leggett, I. Shapoval
The next generation of supercomputing facilities, such as Oak Ridge's Summit and Lawrence Livermore's Sierra, show an increasing use of GPGPUs and other accelerators in order to achieve their high FLOP counts. This trend will only grow with exascale facilities. In general, High Energy Physics computing workflows have made little use of GPUs due to the relatively small fraction of kernels that run efficiently on GPUs, and the expense of rewriting code for rapidly evolving GPU hardware. However, the computing requirements for high-luminosity LHC are enormous, and it will become essential to be able to make use of supercomputing facilities that rely heavily on GPUs and other accelerator technologies. ATLAS has already developed an extension to AthenaMT, its multithreaded event processing framework, that enables the non-intrusive offloading of computations to external accelerator resources, and is developing strategies to schedule the offloading efficiently. Before investing heavily in writing many kernels, we need to better understand the performance metrics and throughput bounds of the workflows with various accelerator configurations. This can be done by simulating the workflows, using real metrics for task interdependencies and timing, as we vary fractions of offloaded tasks, latencies, data conversion speeds, memory bandwidths, and accelerator offloading parameters such as CPU/GPU ratios and speeds. We present the results of these studies, which will be instrumental in directing effort to make the ATLAS framework, kernels and workflows run efficiently on exascale facilities.
{"title":"Simulating HEP Workflows on Heterogeneous Architectures","authors":"C. Leggett, I. Shapoval","doi":"10.1109/eScience.2018.00087","DOIUrl":"https://doi.org/10.1109/eScience.2018.00087","url":null,"abstract":"The next generation of supercomputing facilities, such as Oak Ridge's Summit and Lawrence Livermore's Sierra, show an increasing use of GPGPUs and other accelerators in order to achieve their high FLOP counts. This trend will only grow with exascale facilities. In general, High Energy Physics computing workflows have made little use of GPUs due to the relatively small fraction of kernels that run efficiently on GPUs, and the expense of rewriting code for rapidly evolving GPU hardware. However, the computing requirements for high-luminosity LHC are enormous, and it will become essential to be able to make use of supercomputing facilities that rely heavily on GPUs and other accelerator technologies. ATLAS has already developed an extension to AthenaMT, its multithreaded event processing framework, that enables the non-intrusive offloading of computations to external accelerator resources, and is developing strategies to schedule the offloading efficiently. Before investing heavily in writing many kernels, we need to better understand the performance metrics and throughput bounds of the workflows with various accelerator configurations. This can be done by simulating the workflows, using real metrics for task interdependencies and timing, as we vary fractions of offloaded tasks, latencies, data conversion speeds, memory bandwidths, and accelerator offloading parameters such as CPU/GPU ratios and speeds. We present the results of these studies, which will be instrumental in directing effort to make the ATLAS framework, kernels and workflows run efficiently on exascale facilities.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"6 1","pages":"343-343"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88840048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00086
V. Ananthraj, K. De, S. Jha, A. Klimentov, D. Oleynik, S. Oral, André Merzky, R. Mashinistov, S. Panitkin, P. Svirin, M. Turilli, J. Wells, Sean R. Wilkinson
Traditionally, the ATLAS experiment at Large Hadron Collider (LHC) has utilized distributed resources as provided by the Worldwide LHC Computing Grid (WLCG) to support data distribution, data analysis and simulations. For example, the ATLAS experiment uses a geographically distributed grid of approximately 200,000 cores continuously (250 000 cores at peak), (over 1,000 million core-hours per year) to process, simulate, and analyze its data (todays total data volume of ATLAS is more than 300 PB). After the early success in discovering a new particle consistent with the long-awaited Higgs boson, ATLAS is continuing the precision measurements necessary for further discoveries. Planned high-luminosity LHC upgrade and related ATLAS detector upgrades, that are necessary for physics searches beyond Standard Model, pose serious challenge for ATLAS computing. Data volumes are expected to increase at higher energy and luminosity, causing the storage and computing needs to grow at a much higher pace than the flat budget technology evolution (see Fig. 1). The need for simulation and analysis will overwhelm the expected capacity of WLCG computing facilities unless the range and precision of physics studies will be curtailed.
{"title":"Towards Exascale Computing for High Energy Physics: The ATLAS Experience at ORNL","authors":"V. Ananthraj, K. De, S. Jha, A. Klimentov, D. Oleynik, S. Oral, André Merzky, R. Mashinistov, S. Panitkin, P. Svirin, M. Turilli, J. Wells, Sean R. Wilkinson","doi":"10.1109/eScience.2018.00086","DOIUrl":"https://doi.org/10.1109/eScience.2018.00086","url":null,"abstract":"Traditionally, the ATLAS experiment at Large Hadron Collider (LHC) has utilized distributed resources as provided by the Worldwide LHC Computing Grid (WLCG) to support data distribution, data analysis and simulations. For example, the ATLAS experiment uses a geographically distributed grid of approximately 200,000 cores continuously (250 000 cores at peak), (over 1,000 million core-hours per year) to process, simulate, and analyze its data (todays total data volume of ATLAS is more than 300 PB). After the early success in discovering a new particle consistent with the long-awaited Higgs boson, ATLAS is continuing the precision measurements necessary for further discoveries. Planned high-luminosity LHC upgrade and related ATLAS detector upgrades, that are necessary for physics searches beyond Standard Model, pose serious challenge for ATLAS computing. Data volumes are expected to increase at higher energy and luminosity, causing the storage and computing needs to grow at a much higher pace than the flat budget technology evolution (see Fig. 1). The need for simulation and analysis will overwhelm the expected capacity of WLCG computing facilities unless the range and precision of physics studies will be curtailed.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"27 1","pages":"341-342"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83730700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/ESCIENCE.2018.00132
L. Carvalho, D. Garijo, C. B. Medeiros, Y. Gil
Scientific workflow management systems play a major role in the design, execution and documentation of computational experiments. However, they have limited support for managing workflow evolution and exploration because they lack rich metadata for the software that implements workflow components. Such metadata could be used to support scientists in exploring local adjustments to a workflow, replacing components with similar software, or upgrading components upon release of newer software versions. To address this challenge, we propose OntoSoft-VFF (Ontology for Software Version, Function and Functionality), a software metadata repository designed to capture information about software and workflow components that is important for managing workflow exploration and evolution. Our approach uses a novel ontology to describe the functionality and evolution through time of any software used to create workflow components. OntoSoft-VFF is implemented as an online catalog that stores semantic metadata for software to enable workflow exploration through understanding of software functionality and evolution. The catalog also supports comparison and semantic search of software metadata. We showcase OntoSoft-VFF using machine learning workflow examples. We validate our approach by testing that a workflow system could compare differences in software metadata, explain software updates and describe the general functionality of workflow steps.
{"title":"Semantic Software Metadata for Workflow Exploration and Evolution","authors":"L. Carvalho, D. Garijo, C. B. Medeiros, Y. Gil","doi":"10.1109/ESCIENCE.2018.00132","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2018.00132","url":null,"abstract":"Scientific workflow management systems play a major role in the design, execution and documentation of computational experiments. However, they have limited support for managing workflow evolution and exploration because they lack rich metadata for the software that implements workflow components. Such metadata could be used to support scientists in exploring local adjustments to a workflow, replacing components with similar software, or upgrading components upon release of newer software versions. To address this challenge, we propose OntoSoft-VFF (Ontology for Software Version, Function and Functionality), a software metadata repository designed to capture information about software and workflow components that is important for managing workflow exploration and evolution. Our approach uses a novel ontology to describe the functionality and evolution through time of any software used to create workflow components. OntoSoft-VFF is implemented as an online catalog that stores semantic metadata for software to enable workflow exploration through understanding of software functionality and evolution. The catalog also supports comparison and semantic search of software metadata. We showcase OntoSoft-VFF using machine learning workflow examples. We validate our approach by testing that a workflow system could compare differences in software metadata, explain software updates and describe the general functionality of workflow steps.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"11 1","pages":"431-441"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85248297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-10-01DOI: 10.1109/eScience.2018.00022
P. Roe, Meriem Ferroudj, M. Towsey, L. Schwarzkopf
Passive acoustic recording has great potential for monitoring both endangered and pest species. However, the automatic analysis of natural sound recordings is challenging due to geographic variation in background sounds in habitats and species calls. We have designed and deployed an acoustic sensor network constituting an early warning system for a vocal invasive species, in particular cane toads. The challenging nature of recognising toad calls and the big data arising from sound recording gave rise to a novel edge computing system which permits both effective monitoring and flexible experimentation. This is achieved through a multi-stage analysis system in which calls are detected and progressively filtered, to both reduce data communication needs and to improve detection accuracy. The filtering occurs across different stages of the cloud system. This permits flexible experimentation, for example when a new call or false positive is received. Furthermore, to balance the loss of data from aggressive filtering (call recognition), novel overview techniques are employed to provide data summaries. In this way an end user can receive alerts that a toad call is present, the system can be tuned on the fly, and the user can view summary data to have confidence that the system is functioning correctly. The system has been deployed and is in day-to-day use. The novel approaches taken are applicable to other edge computing systems, which analyse large data streams looking for infrequent events and the system has application for monitoring other vocal species.
{"title":"Catching Toad Calls in the Cloud: Commodity Edge Computing for Flexible Analysis of Big Sound Data","authors":"P. Roe, Meriem Ferroudj, M. Towsey, L. Schwarzkopf","doi":"10.1109/eScience.2018.00022","DOIUrl":"https://doi.org/10.1109/eScience.2018.00022","url":null,"abstract":"Passive acoustic recording has great potential for monitoring both endangered and pest species. However, the automatic analysis of natural sound recordings is challenging due to geographic variation in background sounds in habitats and species calls. We have designed and deployed an acoustic sensor network constituting an early warning system for a vocal invasive species, in particular cane toads. The challenging nature of recognising toad calls and the big data arising from sound recording gave rise to a novel edge computing system which permits both effective monitoring and flexible experimentation. This is achieved through a multi-stage analysis system in which calls are detected and progressively filtered, to both reduce data communication needs and to improve detection accuracy. The filtering occurs across different stages of the cloud system. This permits flexible experimentation, for example when a new call or false positive is received. Furthermore, to balance the loss of data from aggressive filtering (call recognition), novel overview techniques are employed to provide data summaries. In this way an end user can receive alerts that a toad call is present, the system can be tuned on the fly, and the user can view summary data to have confidence that the system is functioning correctly. The system has been deployed and is in day-to-day use. The novel approaches taken are applicable to other edge computing systems, which analyse large data streams looking for infrequent events and the system has application for monitoring other vocal species.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"58 1","pages":"67-74"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80770774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}