Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00067
Katrina Fenlon
Despite the rapid growth of digital scholarship in the humanities, most existing humanities research infrastructures lack adequate support for the creation, management, sharing, maintenance, and preservation of complex, networked digital objects. Research Objects (ROs) have mainly been applied to scientific research workflows, but the RO model and parallel approaches have gained enough uptake in the humanities to suggest their potential to undergird sustainable, networked humanities research infrastructure. This paper reviews several compelling applications in the humanities of RO and closely related models in platforms for data sharing, computational text analysis, collaborative annotation, digital and semantic publishing, and in domain repositories. The paper identifies challenges confronting the broad application of ROs in the humanities—which challenges will confront any emergent model for humanities data-or workflow-packaging and publication—and suggests implications for implementations in humanities cyberinfrastructure.
{"title":"Interactivity, Distributed Workflows, and Thick Provenance: A Review of Challenges Confronting Digital Humanities Research Objects","authors":"Katrina Fenlon","doi":"10.1109/eScience.2019.00067","DOIUrl":"https://doi.org/10.1109/eScience.2019.00067","url":null,"abstract":"Despite the rapid growth of digital scholarship in the humanities, most existing humanities research infrastructures lack adequate support for the creation, management, sharing, maintenance, and preservation of complex, networked digital objects. Research Objects (ROs) have mainly been applied to scientific research workflows, but the RO model and parallel approaches have gained enough uptake in the humanities to suggest their potential to undergird sustainable, networked humanities research infrastructure. This paper reviews several compelling applications in the humanities of RO and closely related models in platforms for data sharing, computational text analysis, collaborative annotation, digital and semantic publishing, and in domain repositories. The paper identifies challenges confronting the broad application of ROs in the humanities—which challenges will confront any emergent model for humanities data-or workflow-packaging and publication—and suggests implications for implementations in humanities cyberinfrastructure.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124639179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00060
S. Colutto, Philip Kahle, Günter Hackl, Günter Mühlberger
The Transkribus platform provides services and tools for the digitization, transcription, recognition and searching of historical documents. It is the only platform worldwide were non-technical users are enabled to train their own machine learning based neural networks and to apply them on their documents in order to generate an automated transcription and to make them searchable via keyword spotting. Transkribus is used by thousands of users and hundreds of archives, libraries, and research groups all over the world. In this paper we briefly describe the approach of the platform in terms of the underlying business and governance model, as well as the technical aspects of the platform.
{"title":"Transkribus. A Platform for Automated Text Recognition and Searching of Historical Documents","authors":"S. Colutto, Philip Kahle, Günter Hackl, Günter Mühlberger","doi":"10.1109/eScience.2019.00060","DOIUrl":"https://doi.org/10.1109/eScience.2019.00060","url":null,"abstract":"The Transkribus platform provides services and tools for the digitization, transcription, recognition and searching of historical documents. It is the only platform worldwide were non-technical users are enabled to train their own machine learning based neural networks and to apply them on their documents in order to generate an automated transcription and to make them searchable via keyword spotting. Transkribus is used by thousands of users and hundreds of archives, libraries, and research groups all over the world. In this paper we briefly describe the approach of the platform in terms of the underlying business and governance model, as well as the technical aspects of the platform.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130574400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00034
S. Benz, Hogeun Park, Jiaxin Li, Daniel Crawl, J. Block, M. Nguyen, I. Altintas
In summer 2017, close to one million Rohingya, an ethnic minority group in Myanmar, have fled to Bangladesh due to the persecution of Muslims. This large influx of refugees has resided around existing refugee camps. Because of this dramatic expansion, the newly established Kutupalong-Balukhali expansion site lacked basic infrastructure and public service. While Non-Governmental Organizations (NGOs) such as Refugee Relief and Repatriation Commissioner (RRCC) conducted a series of counting exercises to understand the demographics of refugees, our understanding of camp formation is still limited. Since the household type survey is time-consuming and does not entail geo-information, we propose to use a combination of high-resolution satellite imagery and machine learning (ML) techniques to assess the spatiotemporal dynamics of the refugee camp. Four Very-High Resolution (VHR) images (i.e., World View-2) are analyze to compare the camp pre-and post-influx. Using deep learning and unsupervised learning, we organized the satellite image tiles of a given region into geographically relevant categories. Specifically, we used a pre-trained convolutional neural network (CNN) to extract features from the image tiles, followed by cluster analysis to segment the extracted features into similar groups. Our results show that the size of the built-up area increased significantly from 0.4 km² in January 2016 and 1.5 km² in May 2017 to 8.9 km² in December 2017 and 9.5 km² in February 2018. Through the benefits of unsupervised machine learning, we further detected the densification of the refugee camp over time and were able to display its heterogeneous structure. The developed method is scalable and applicable to rapidly expanding settlements across various regions. And thus a useful tool to enhance our understanding of the structure of refugee camps, which enables us to allocate resources for humanitarian needs to the most vulnerable populations.
{"title":"Understanding a Rapidly Expanding Refugee Camp Using Convolutional Neural Networks and Satellite Imagery","authors":"S. Benz, Hogeun Park, Jiaxin Li, Daniel Crawl, J. Block, M. Nguyen, I. Altintas","doi":"10.1109/eScience.2019.00034","DOIUrl":"https://doi.org/10.1109/eScience.2019.00034","url":null,"abstract":"In summer 2017, close to one million Rohingya, an ethnic minority group in Myanmar, have fled to Bangladesh due to the persecution of Muslims. This large influx of refugees has resided around existing refugee camps. Because of this dramatic expansion, the newly established Kutupalong-Balukhali expansion site lacked basic infrastructure and public service. While Non-Governmental Organizations (NGOs) such as Refugee Relief and Repatriation Commissioner (RRCC) conducted a series of counting exercises to understand the demographics of refugees, our understanding of camp formation is still limited. Since the household type survey is time-consuming and does not entail geo-information, we propose to use a combination of high-resolution satellite imagery and machine learning (ML) techniques to assess the spatiotemporal dynamics of the refugee camp. Four Very-High Resolution (VHR) images (i.e., World View-2) are analyze to compare the camp pre-and post-influx. Using deep learning and unsupervised learning, we organized the satellite image tiles of a given region into geographically relevant categories. Specifically, we used a pre-trained convolutional neural network (CNN) to extract features from the image tiles, followed by cluster analysis to segment the extracted features into similar groups. Our results show that the size of the built-up area increased significantly from 0.4 km² in January 2016 and 1.5 km² in May 2017 to 8.9 km² in December 2017 and 9.5 km² in February 2018. Through the benefits of unsupervised machine learning, we further detected the densification of the refugee camp over time and were able to display its heterogeneous structure. The developed method is scalable and applicable to rapidly expanding settlements across various regions. And thus a useful tool to enhance our understanding of the structure of refugee camps, which enables us to allocate resources for humanitarian needs to the most vulnerable populations.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130288095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00063
Martin Bobák, Balázs Somosköi, Mara Graziani, M. Heikkurinen, Maximilian Höb, Jan Schmidt, L. Hluchý, A. Belloum, R. Cushing, J. Meizner, P. Nowakowski, V. Tran, O. Habala, J. Maassen
While political commitments for building exascale systems have been made, turning these systems into platforms for a wide range of exascale applications faces several technical, organisational and skills-related challenges. The key technical challenges are related to the availability of data. While the first exascale machines are likely to be built within a single site, the input data is in many cases impossible to store within a single site. Alongside handling of extreme-large amount of data, the exascale system has to process data from different sources, support accelerated computing, handle high volume of requests per day, minimize the size of data flows, and be extensible in terms of continuously increasing data as well as increase in parallel requests being sent. These technical challenges are addressed by the general reference exascale architecture. It is divided into three main blocks: virtualization layer, distributed virtual file system, and manager of computing resources. Its main property is modularity which is achieved by containerization at two levels: 1) application containers - containerization of scientific workflows, 2) micro-infrastructure - containerization of extreme-large data service-oriented infrastructure. The paper also presents an instantiation of the reference architecture - the architecture of the PROCESS project (PROviding Computing solutions for ExaScale ChallengeS) and discuss its relation to the reference exascale architecture. The PROCESS architecture has been used as an exascale platform within various exascale pilot applications. This work will present the requirements and the derived architecture as well as the 5 use cases pilots that it made possible.
{"title":"Reference Exascale Architecture","authors":"Martin Bobák, Balázs Somosköi, Mara Graziani, M. Heikkurinen, Maximilian Höb, Jan Schmidt, L. Hluchý, A. Belloum, R. Cushing, J. Meizner, P. Nowakowski, V. Tran, O. Habala, J. Maassen","doi":"10.1109/eScience.2019.00063","DOIUrl":"https://doi.org/10.1109/eScience.2019.00063","url":null,"abstract":"While political commitments for building exascale systems have been made, turning these systems into platforms for a wide range of exascale applications faces several technical, organisational and skills-related challenges. The key technical challenges are related to the availability of data. While the first exascale machines are likely to be built within a single site, the input data is in many cases impossible to store within a single site. Alongside handling of extreme-large amount of data, the exascale system has to process data from different sources, support accelerated computing, handle high volume of requests per day, minimize the size of data flows, and be extensible in terms of continuously increasing data as well as increase in parallel requests being sent. These technical challenges are addressed by the general reference exascale architecture. It is divided into three main blocks: virtualization layer, distributed virtual file system, and manager of computing resources. Its main property is modularity which is achieved by containerization at two levels: 1) application containers - containerization of scientific workflows, 2) micro-infrastructure - containerization of extreme-large data service-oriented infrastructure. The paper also presents an instantiation of the reference architecture - the architecture of the PROCESS project (PROviding Computing solutions for ExaScale ChallengeS) and discuss its relation to the reference exascale architecture. The PROCESS architecture has been used as an exascale platform within various exascale pilot applications. This work will present the requirements and the derived architecture as well as the 5 use cases pilots that it made possible.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129280917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00022
Pierre Andrieu, Bryan Brancotte, L. Bulteau, Sarah Cohen-Boulakia, A. Denise, A. Pierrot, Stéphane Vialette
Massive biological datasets are available in public databases and can be queried using portals with keyword queries. Ranked lists of answers are obtained by users. However, properly querying such portals remains difficult since various formulations of the same query can be considered (e.g., using synonyms). Consequently, users have to manually combine several lists of hundreds of answers into one list. Rank aggregation techniques are particularly well-fitted to this context as they take in a set of ranked elements (rankings) and provide a consensus, that is, a single ranking which is the "closest" to the input rankings. However, the problem of rank aggregation is NP-hard in most cases. Using an exact algorithm is currently not possible for more than a few dozens of elements. A plethora of heuristics have thus been proposed which behaviour are, by essence, difficult to anticipate: given a set of input rankings, one cannot guarantee how far from an exact solution the consensus ranking provided by an heuristic will be. The two challenges we want to tackle in this paper are the following: (i) providing an approach based on a pre-process to decompose large data sets into smaller ones where high-quality algorithms can be run and (ii) providing information to users on the robustness of the positions of elements in the consensus ranking produced. Our approach not only lies in mathematical bases, offering guarantees on the result computed but it has also been implemented in a real system available to life science community and tested on various real use cases.
{"title":"Reliability-Aware and Graph-Based Approach for Rank Aggregation of Biological Data","authors":"Pierre Andrieu, Bryan Brancotte, L. Bulteau, Sarah Cohen-Boulakia, A. Denise, A. Pierrot, Stéphane Vialette","doi":"10.1109/eScience.2019.00022","DOIUrl":"https://doi.org/10.1109/eScience.2019.00022","url":null,"abstract":"Massive biological datasets are available in public databases and can be queried using portals with keyword queries. Ranked lists of answers are obtained by users. However, properly querying such portals remains difficult since various formulations of the same query can be considered (e.g., using synonyms). Consequently, users have to manually combine several lists of hundreds of answers into one list. Rank aggregation techniques are particularly well-fitted to this context as they take in a set of ranked elements (rankings) and provide a consensus, that is, a single ranking which is the \"closest\" to the input rankings. However, the problem of rank aggregation is NP-hard in most cases. Using an exact algorithm is currently not possible for more than a few dozens of elements. A plethora of heuristics have thus been proposed which behaviour are, by essence, difficult to anticipate: given a set of input rankings, one cannot guarantee how far from an exact solution the consensus ranking provided by an heuristic will be. The two challenges we want to tackle in this paper are the following: (i) providing an approach based on a pre-process to decompose large data sets into smaller ones where high-quality algorithms can be run and (ii) providing information to users on the robustness of the positions of elements in the consensus ranking produced. Our approach not only lies in mathematical bases, offering guarantees on the result computed but it has also been implemented in a real system available to life science community and tested on various real use cases.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126880702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00086
Edward R. Moynihan, J. Schopf, J. Zurawski
In 2018, the US National Science Foundation (NSF) funded the Engagement and Performance Operations Center (EPOC), a joint project between Indiana University (IU) and the Department of Energy's Energy Science Network (ESnet), to work with domain scientists to accelerate the ability of distributed collaborations to share data in order to reach broader science goals. The goal of this funding was to create an operations center for engagement - including definition of formal processes, tracking of engagements, and funded staff, not simply best effort by volunteers, with a goal of enabling digital societies to better share scientific data.
{"title":"The Engagement and Performance Operations Center: EPOC","authors":"Edward R. Moynihan, J. Schopf, J. Zurawski","doi":"10.1109/eScience.2019.00086","DOIUrl":"https://doi.org/10.1109/eScience.2019.00086","url":null,"abstract":"In 2018, the US National Science Foundation (NSF) funded the Engagement and Performance Operations Center (EPOC), a joint project between Indiana University (IU) and the Department of Energy's Energy Science Network (ESnet), to work with domain scientists to accelerate the ability of distributed collaborations to share data in order to reach broader science goals. The goal of this funding was to create an operations center for engagement - including definition of formal processes, tracking of engagements, and funded staff, not simply best effort by volunteers, with a goal of enabling digital societies to better share scientific data.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123259454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00038
A. Petzold, A. Asmi, A. Vermeulen, G. Pappalardo, D. Bailo, D. Schaap, H. Glaves, U. Bundke, Zhiming Zhao
ENVRI-FAIR is a recently launched project of the European Union's Horizon 2020 program (EU H2020), connecting the cluster of European Environmental Research Infrastructures (ENVRI) to the European Open Science Cloud (EOSC). The overarching goal of ENVRI-FAIR is that all participating research infrastructures (RIs) will provide a set of interoperable FAIR data services that enhance the efficiency and productivity of researchers, support innovation, enable data-and knowledge-based decisions and connect the ENVRI cluster to the EOSC. This goal will be reached by: (1) defining community policies and standards across all stages of the data life cycle, aligned with the wider European policies and with international developments; (2) creating for all participating RIs sustainable, transparent and auditable data services for each stage of the data life cycle, following the FAIR principles; (3) implementing prototypes for testing pre-production services at each RI, leading to a catalogue of prepared services; (4) exposing the complete set of thematic data services and tools of the ENVRI cluster to the EOSC catalogue of services.
{"title":"ENVRI-FAIR - Interoperable Environmental FAIR Data and Services for Society, Innovation and Research","authors":"A. Petzold, A. Asmi, A. Vermeulen, G. Pappalardo, D. Bailo, D. Schaap, H. Glaves, U. Bundke, Zhiming Zhao","doi":"10.1109/eScience.2019.00038","DOIUrl":"https://doi.org/10.1109/eScience.2019.00038","url":null,"abstract":"ENVRI-FAIR is a recently launched project of the European Union's Horizon 2020 program (EU H2020), connecting the cluster of European Environmental Research Infrastructures (ENVRI) to the European Open Science Cloud (EOSC). The overarching goal of ENVRI-FAIR is that all participating research infrastructures (RIs) will provide a set of interoperable FAIR data services that enhance the efficiency and productivity of researchers, support innovation, enable data-and knowledge-based decisions and connect the ENVRI cluster to the EOSC. This goal will be reached by: (1) defining community policies and standards across all stages of the data life cycle, aligned with the wider European policies and with international developments; (2) creating for all participating RIs sustainable, transparent and auditable data services for each stage of the data life cycle, following the FAIR principles; (3) implementing prototypes for testing pre-production services at each RI, leading to a catalogue of prepared services; (4) exposing the complete set of thematic data services and tools of the ENVRI cluster to the EOSC catalogue of services.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126256233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00066
T. McPhillips, C. Willis, M. Gryk, Santiago Núñez Corrales, Bertram Ludäscher
Research Objects have the potential to significantly enhance the reproducibility of scientific research. One important way Research Objects can do this is by encapsulating the means for re-executing the computational components of studies, thus supporting the new form of reproducibility enabled by digital computing-exact repeatability. However, Research Objects also can make scientific research more reproducible by supporting transparency, a component of reproducibility orthogonal to re-executability. We describe here our vision for making Research Objects more transparent by providing means for disambiguating claims about reproducibility generally, and computational repeatability specifically. We show how support for science-oriented queries can enable researchers to assess the reproducibility of Research Objects and the individual methods and results they encapsulate.
{"title":"Reproducibility by Other Means: Transparent Research Objects","authors":"T. McPhillips, C. Willis, M. Gryk, Santiago Núñez Corrales, Bertram Ludäscher","doi":"10.1109/eScience.2019.00066","DOIUrl":"https://doi.org/10.1109/eScience.2019.00066","url":null,"abstract":"Research Objects have the potential to significantly enhance the reproducibility of scientific research. One important way Research Objects can do this is by encapsulating the means for re-executing the computational components of studies, thus supporting the new form of reproducibility enabled by digital computing-exact repeatability. However, Research Objects also can make scientific research more reproducible by supporting transparency, a component of reproducibility orthogonal to re-executability. We describe here our vision for making Research Objects more transparent by providing means for disambiguating claims about reproducibility generally, and computational repeatability specifically. We show how support for science-oriented queries can enable researchers to assess the reproducibility of Research Objects and the individual methods and results they encapsulate.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128106299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00010
F. B. J. R. Dallaqua, Á. Fazenda, F. Faria
Scientific projects involving volunteers for analyzing, collecting data, and using their computational resources, known as Citizen Science (CS), have become popular due to advances in information and communication technology (ICT). Many CS projects have been proposed to involve citizens in different knowledge domain such as astronomy, chemistry, mathematics, and physics. This work presents a CS project called ForestEyes, which proposes to track deforestation in rainforests by asking volunteers to analyze and classify remote sensing images. These manually classified data are used as input for training a pattern classifier that will be used to label new remote sensing images. ForestEyes project was created on the Zooniverse.org CS platform, and to attest the quality of the volunteers' answers, were performed early campaigns with remote sensing images from Brazilian Legal Amazon (BLA). The results were processed and compared to an oracle classification (PRODES - Amazon Deforestation Monitoring Project). Two and a half weeks after launch, more than 35,000 answers from 383 volunteers (117 anonymous and 266 registered users) were received, completing all 2050 tasks. The ForestEyes campaigns' results have shown that volunteers achieved excellent effectiveness results in remote sensing image classification task. Furthermore, these results show that CS might be a powerful tool to quickly obtain a large amount of high-quality labeled data.
{"title":"ForestEyes Project: Can Citizen Scientists Help Rainforests?","authors":"F. B. J. R. Dallaqua, Á. Fazenda, F. Faria","doi":"10.1109/eScience.2019.00010","DOIUrl":"https://doi.org/10.1109/eScience.2019.00010","url":null,"abstract":"Scientific projects involving volunteers for analyzing, collecting data, and using their computational resources, known as Citizen Science (CS), have become popular due to advances in information and communication technology (ICT). Many CS projects have been proposed to involve citizens in different knowledge domain such as astronomy, chemistry, mathematics, and physics. This work presents a CS project called ForestEyes, which proposes to track deforestation in rainforests by asking volunteers to analyze and classify remote sensing images. These manually classified data are used as input for training a pattern classifier that will be used to label new remote sensing images. ForestEyes project was created on the Zooniverse.org CS platform, and to attest the quality of the volunteers' answers, were performed early campaigns with remote sensing images from Brazilian Legal Amazon (BLA). The results were processed and compared to an oracle classification (PRODES - Amazon Deforestation Monitoring Project). Two and a half weeks after launch, more than 35,000 answers from 383 volunteers (117 anonymous and 266 registered users) were received, completing all 2050 tasks. The ForestEyes campaigns' results have shown that volunteers achieved excellent effectiveness results in remote sensing image classification task. Furthermore, these results show that CS might be a powerful tool to quickly obtain a large amount of high-quality labeled data.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115510662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-09-01DOI: 10.1109/eScience.2019.00093
Elias el Khaldi Ahanach, Spiros Koulouzis, Zhiming Zhao
When executing scientific workflows, anomalies of the workflow behavior are often caused by different issues such as resource failures at the underlying infrastructure. The provenance information collected by workflow management systems only captures the transformation of data at the workflow level. Analyzing provenance information and apposite system metrics requires expertise and manual effort. Moreover, it is often timeconsuming to aggregate this information and correlate events occurring at different levels of the infrastructure. In this paper, we propose an architecture to automate the integration among workflow provenance information and performance information from the infrastructure level. Our architecture enables workflow developers or domain scientists to effectively browse workflow execution information together with the system metrics, and analyze contextual information for possible anomalies.
{"title":"Contextual Linking between Workflow Provenance and System Performance Logs","authors":"Elias el Khaldi Ahanach, Spiros Koulouzis, Zhiming Zhao","doi":"10.1109/eScience.2019.00093","DOIUrl":"https://doi.org/10.1109/eScience.2019.00093","url":null,"abstract":"When executing scientific workflows, anomalies of the workflow behavior are often caused by different issues such as resource failures at the underlying infrastructure. The provenance information collected by workflow management systems only captures the transformation of data at the workflow level. Analyzing provenance information and apposite system metrics requires expertise and manual effort. Moreover, it is often timeconsuming to aggregate this information and correlate events occurring at different levels of the infrastructure. In this paper, we propose an architecture to automate the integration among workflow provenance information and performance information from the infrastructure level. Our architecture enables workflow developers or domain scientists to effectively browse workflow execution information together with the system metrics, and analyze contextual information for possible anomalies.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114195286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}