首页 > 最新文献

2019 15th International Conference on eScience (eScience)最新文献

英文 中文
Interactivity, Distributed Workflows, and Thick Provenance: A Review of Challenges Confronting Digital Humanities Research Objects 交互性、分布式工作流和厚重的来源:数字人文学科研究对象面临的挑战综述
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00067
Katrina Fenlon
Despite the rapid growth of digital scholarship in the humanities, most existing humanities research infrastructures lack adequate support for the creation, management, sharing, maintenance, and preservation of complex, networked digital objects. Research Objects (ROs) have mainly been applied to scientific research workflows, but the RO model and parallel approaches have gained enough uptake in the humanities to suggest their potential to undergird sustainable, networked humanities research infrastructure. This paper reviews several compelling applications in the humanities of RO and closely related models in platforms for data sharing, computational text analysis, collaborative annotation, digital and semantic publishing, and in domain repositories. The paper identifies challenges confronting the broad application of ROs in the humanities—which challenges will confront any emergent model for humanities data-or workflow-packaging and publication—and suggests implications for implementations in humanities cyberinfrastructure.
尽管人文学科的数字学术发展迅速,但大多数现有的人文研究基础设施缺乏足够的支持来创建、管理、共享、维护和保存复杂的、网络化的数字对象。研究对象(RO)主要应用于科学研究工作流程,但RO模型和并行方法已经在人文学科中获得了足够的吸收,表明它们有潜力支撑可持续的、网络化的人文学科研究基础设施。本文回顾了RO人文学科中几个引人注目的应用,以及在数据共享、计算文本分析、协作注释、数字和语义发布以及领域存储库等平台中密切相关的模型。本文确定了ROs在人文学科中广泛应用所面临的挑战——这些挑战将面临人文学科数据或工作流打包和发布的任何新兴模型——并提出了在人文学科网络基础设施中实现的影响。
{"title":"Interactivity, Distributed Workflows, and Thick Provenance: A Review of Challenges Confronting Digital Humanities Research Objects","authors":"Katrina Fenlon","doi":"10.1109/eScience.2019.00067","DOIUrl":"https://doi.org/10.1109/eScience.2019.00067","url":null,"abstract":"Despite the rapid growth of digital scholarship in the humanities, most existing humanities research infrastructures lack adequate support for the creation, management, sharing, maintenance, and preservation of complex, networked digital objects. Research Objects (ROs) have mainly been applied to scientific research workflows, but the RO model and parallel approaches have gained enough uptake in the humanities to suggest their potential to undergird sustainable, networked humanities research infrastructure. This paper reviews several compelling applications in the humanities of RO and closely related models in platforms for data sharing, computational text analysis, collaborative annotation, digital and semantic publishing, and in domain repositories. The paper identifies challenges confronting the broad application of ROs in the humanities—which challenges will confront any emergent model for humanities data-or workflow-packaging and publication—and suggests implications for implementations in humanities cyberinfrastructure.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124639179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Transkribus. A Platform for Automated Text Recognition and Searching of Historical Documents Transkribus。历史文献自动文本识别与检索平台
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00060
S. Colutto, Philip Kahle, Günter Hackl, Günter Mühlberger
The Transkribus platform provides services and tools for the digitization, transcription, recognition and searching of historical documents. It is the only platform worldwide were non-technical users are enabled to train their own machine learning based neural networks and to apply them on their documents in order to generate an automated transcription and to make them searchable via keyword spotting. Transkribus is used by thousands of users and hundreds of archives, libraries, and research groups all over the world. In this paper we briefly describe the approach of the platform in terms of the underlying business and governance model, as well as the technical aspects of the platform.
Transkribus平台为历史文献的数字化、转录、识别和搜索提供服务和工具。它是世界上唯一一个非技术用户能够训练自己的基于机器学习的神经网络的平台,并将其应用于他们的文档,以生成自动转录,并通过关键字定位使其可搜索。Transkribus被世界各地成千上万的用户和数百个档案馆、图书馆和研究小组所使用。在本文中,我们从基础业务和治理模型以及平台的技术方面简要描述了该平台的方法。
{"title":"Transkribus. A Platform for Automated Text Recognition and Searching of Historical Documents","authors":"S. Colutto, Philip Kahle, Günter Hackl, Günter Mühlberger","doi":"10.1109/eScience.2019.00060","DOIUrl":"https://doi.org/10.1109/eScience.2019.00060","url":null,"abstract":"The Transkribus platform provides services and tools for the digitization, transcription, recognition and searching of historical documents. It is the only platform worldwide were non-technical users are enabled to train their own machine learning based neural networks and to apply them on their documents in order to generate an automated transcription and to make them searchable via keyword spotting. Transkribus is used by thousands of users and hundreds of archives, libraries, and research groups all over the world. In this paper we briefly describe the approach of the platform in terms of the underlying business and governance model, as well as the technical aspects of the platform.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130574400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Understanding a Rapidly Expanding Refugee Camp Using Convolutional Neural Networks and Satellite Imagery 使用卷积神经网络和卫星图像了解快速扩张的难民营
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00034
S. Benz, Hogeun Park, Jiaxin Li, Daniel Crawl, J. Block, M. Nguyen, I. Altintas
In summer 2017, close to one million Rohingya, an ethnic minority group in Myanmar, have fled to Bangladesh due to the persecution of Muslims. This large influx of refugees has resided around existing refugee camps. Because of this dramatic expansion, the newly established Kutupalong-Balukhali expansion site lacked basic infrastructure and public service. While Non-Governmental Organizations (NGOs) such as Refugee Relief and Repatriation Commissioner (RRCC) conducted a series of counting exercises to understand the demographics of refugees, our understanding of camp formation is still limited. Since the household type survey is time-consuming and does not entail geo-information, we propose to use a combination of high-resolution satellite imagery and machine learning (ML) techniques to assess the spatiotemporal dynamics of the refugee camp. Four Very-High Resolution (VHR) images (i.e., World View-2) are analyze to compare the camp pre-and post-influx. Using deep learning and unsupervised learning, we organized the satellite image tiles of a given region into geographically relevant categories. Specifically, we used a pre-trained convolutional neural network (CNN) to extract features from the image tiles, followed by cluster analysis to segment the extracted features into similar groups. Our results show that the size of the built-up area increased significantly from 0.4 km² in January 2016 and 1.5 km² in May 2017 to 8.9 km² in December 2017 and 9.5 km² in February 2018. Through the benefits of unsupervised machine learning, we further detected the densification of the refugee camp over time and were able to display its heterogeneous structure. The developed method is scalable and applicable to rapidly expanding settlements across various regions. And thus a useful tool to enhance our understanding of the structure of refugee camps, which enables us to allocate resources for humanitarian needs to the most vulnerable populations.
2017年夏天,由于穆斯林受到迫害,近100万缅甸少数民族罗兴亚人逃往孟加拉国。这些大量涌入的难民居住在现有难民营周围。由于这种戏剧性的扩张,新建立的Kutupalong-Balukhali扩建地点缺乏基本的基础设施和公共服务。虽然非政府组织(ngo),如难民救济和遣返专员(RRCC)进行了一系列的统计练习,以了解难民的人口统计,但我们对营地形成的了解仍然有限。由于家庭类型调查耗时且不需要地理信息,我们建议结合高分辨率卫星图像和机器学习(ML)技术来评估难民营的时空动态。分析了四张超高分辨率(VHR)图像(即World View-2),以比较难民营流入前后的情况。使用深度学习和无监督学习,我们将给定区域的卫星图像块组织成地理上相关的类别。具体来说,我们使用预训练的卷积神经网络(CNN)从图像块中提取特征,然后通过聚类分析将提取的特征划分为相似的组。研究结果表明,建成区规模从2016年1月的0.4 km²和2017年5月的1.5 km²显著增加到2017年12月的8.9 km²和2018年2月的9.5 km²。通过无监督机器学习的好处,我们进一步检测了难民营随时间的密度,并能够显示其异质结构。所开发的方法具有可扩展性,适用于各个地区快速扩张的定居点。因此,这是一个有用的工具,可以加强我们对难民营结构的了解,使我们能够为最脆弱的人口的人道主义需求分配资源。
{"title":"Understanding a Rapidly Expanding Refugee Camp Using Convolutional Neural Networks and Satellite Imagery","authors":"S. Benz, Hogeun Park, Jiaxin Li, Daniel Crawl, J. Block, M. Nguyen, I. Altintas","doi":"10.1109/eScience.2019.00034","DOIUrl":"https://doi.org/10.1109/eScience.2019.00034","url":null,"abstract":"In summer 2017, close to one million Rohingya, an ethnic minority group in Myanmar, have fled to Bangladesh due to the persecution of Muslims. This large influx of refugees has resided around existing refugee camps. Because of this dramatic expansion, the newly established Kutupalong-Balukhali expansion site lacked basic infrastructure and public service. While Non-Governmental Organizations (NGOs) such as Refugee Relief and Repatriation Commissioner (RRCC) conducted a series of counting exercises to understand the demographics of refugees, our understanding of camp formation is still limited. Since the household type survey is time-consuming and does not entail geo-information, we propose to use a combination of high-resolution satellite imagery and machine learning (ML) techniques to assess the spatiotemporal dynamics of the refugee camp. Four Very-High Resolution (VHR) images (i.e., World View-2) are analyze to compare the camp pre-and post-influx. Using deep learning and unsupervised learning, we organized the satellite image tiles of a given region into geographically relevant categories. Specifically, we used a pre-trained convolutional neural network (CNN) to extract features from the image tiles, followed by cluster analysis to segment the extracted features into similar groups. Our results show that the size of the built-up area increased significantly from 0.4 km² in January 2016 and 1.5 km² in May 2017 to 8.9 km² in December 2017 and 9.5 km² in February 2018. Through the benefits of unsupervised machine learning, we further detected the densification of the refugee camp over time and were able to display its heterogeneous structure. The developed method is scalable and applicable to rapidly expanding settlements across various regions. And thus a useful tool to enhance our understanding of the structure of refugee camps, which enables us to allocate resources for humanitarian needs to the most vulnerable populations.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130288095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reference Exascale Architecture 参考Exascale架构
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00063
Martin Bobák, Balázs Somosköi, Mara Graziani, M. Heikkurinen, Maximilian Höb, Jan Schmidt, L. Hluchý, A. Belloum, R. Cushing, J. Meizner, P. Nowakowski, V. Tran, O. Habala, J. Maassen
While political commitments for building exascale systems have been made, turning these systems into platforms for a wide range of exascale applications faces several technical, organisational and skills-related challenges. The key technical challenges are related to the availability of data. While the first exascale machines are likely to be built within a single site, the input data is in many cases impossible to store within a single site. Alongside handling of extreme-large amount of data, the exascale system has to process data from different sources, support accelerated computing, handle high volume of requests per day, minimize the size of data flows, and be extensible in terms of continuously increasing data as well as increase in parallel requests being sent. These technical challenges are addressed by the general reference exascale architecture. It is divided into three main blocks: virtualization layer, distributed virtual file system, and manager of computing resources. Its main property is modularity which is achieved by containerization at two levels: 1) application containers - containerization of scientific workflows, 2) micro-infrastructure - containerization of extreme-large data service-oriented infrastructure. The paper also presents an instantiation of the reference architecture - the architecture of the PROCESS project (PROviding Computing solutions for ExaScale ChallengeS) and discuss its relation to the reference exascale architecture. The PROCESS architecture has been used as an exascale platform within various exascale pilot applications. This work will present the requirements and the derived architecture as well as the 5 use cases pilots that it made possible.
虽然建立百亿亿级系统的政治承诺已经做出,但将这些系统转变为广泛的百亿亿级应用平台面临着一些技术、组织和技能相关的挑战。关键的技术挑战与数据的可用性有关。虽然第一批百亿亿次计算机很可能在一个站点内建造,但在许多情况下,输入数据不可能存储在单个站点内。除了处理海量数据外,exascale系统还必须处理来自不同来源的数据,支持加速计算,每天处理大量请求,最小化数据流的大小,并在不断增加的数据和并发请求方面具有可扩展性。这些技术挑战由通用参考百亿亿级架构解决。它主要分为三个部分:虚拟化层、分布式虚拟文件系统和计算资源管理器。它的主要特性是模块化,这是通过两个级别的容器化实现的:1)应用程序容器——科学工作流的容器化;2)微基础设施——超大型数据面向服务的基础设施的容器化。本文还提出了参考体系结构的一个实例——PROCESS项目的体系结构(为ExaScale挑战提供计算解决方案),并讨论了它与参考ExaScale体系结构的关系。PROCESS架构已被用作各种exascale试验应用程序中的exascale平台。这项工作将呈现需求和派生的体系结构,以及它使之成为可能的5个用例试点。
{"title":"Reference Exascale Architecture","authors":"Martin Bobák, Balázs Somosköi, Mara Graziani, M. Heikkurinen, Maximilian Höb, Jan Schmidt, L. Hluchý, A. Belloum, R. Cushing, J. Meizner, P. Nowakowski, V. Tran, O. Habala, J. Maassen","doi":"10.1109/eScience.2019.00063","DOIUrl":"https://doi.org/10.1109/eScience.2019.00063","url":null,"abstract":"While political commitments for building exascale systems have been made, turning these systems into platforms for a wide range of exascale applications faces several technical, organisational and skills-related challenges. The key technical challenges are related to the availability of data. While the first exascale machines are likely to be built within a single site, the input data is in many cases impossible to store within a single site. Alongside handling of extreme-large amount of data, the exascale system has to process data from different sources, support accelerated computing, handle high volume of requests per day, minimize the size of data flows, and be extensible in terms of continuously increasing data as well as increase in parallel requests being sent. These technical challenges are addressed by the general reference exascale architecture. It is divided into three main blocks: virtualization layer, distributed virtual file system, and manager of computing resources. Its main property is modularity which is achieved by containerization at two levels: 1) application containers - containerization of scientific workflows, 2) micro-infrastructure - containerization of extreme-large data service-oriented infrastructure. The paper also presents an instantiation of the reference architecture - the architecture of the PROCESS project (PROviding Computing solutions for ExaScale ChallengeS) and discuss its relation to the reference exascale architecture. The PROCESS architecture has been used as an exascale platform within various exascale pilot applications. This work will present the requirements and the derived architecture as well as the 5 use cases pilots that it made possible.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129280917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability-Aware and Graph-Based Approach for Rank Aggregation of Biological Data 基于可靠性感知和图的生物数据等级聚合方法
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00022
Pierre Andrieu, Bryan Brancotte, L. Bulteau, Sarah Cohen-Boulakia, A. Denise, A. Pierrot, Stéphane Vialette
Massive biological datasets are available in public databases and can be queried using portals with keyword queries. Ranked lists of answers are obtained by users. However, properly querying such portals remains difficult since various formulations of the same query can be considered (e.g., using synonyms). Consequently, users have to manually combine several lists of hundreds of answers into one list. Rank aggregation techniques are particularly well-fitted to this context as they take in a set of ranked elements (rankings) and provide a consensus, that is, a single ranking which is the "closest" to the input rankings. However, the problem of rank aggregation is NP-hard in most cases. Using an exact algorithm is currently not possible for more than a few dozens of elements. A plethora of heuristics have thus been proposed which behaviour are, by essence, difficult to anticipate: given a set of input rankings, one cannot guarantee how far from an exact solution the consensus ranking provided by an heuristic will be. The two challenges we want to tackle in this paper are the following: (i) providing an approach based on a pre-process to decompose large data sets into smaller ones where high-quality algorithms can be run and (ii) providing information to users on the robustness of the positions of elements in the consensus ranking produced. Our approach not only lies in mathematical bases, offering guarantees on the result computed but it has also been implemented in a real system available to life science community and tested on various real use cases.
大量的生物数据集在公共数据库中可用,并且可以使用带有关键字查询的门户进行查询。答案的排名列表由用户获得。然而,正确查询这样的门户仍然很困难,因为可以考虑相同查询的各种公式(例如,使用同义词)。因此,用户必须手动将数百个答案的多个列表合并为一个列表。排名聚合技术特别适合这种情况,因为它们采用一组排名元素(排名)并提供共识,即“最接近”输入排名的单一排名。然而,在大多数情况下,排序聚合问题是np困难的。对于超过几十个元素,目前不可能使用精确的算法。因此,人们提出了大量的启发式方法,从本质上讲,这些行为是难以预测的:给定一组输入排名,人们无法保证启发式方法提供的共识排名离精确解决方案有多远。我们在本文中想要解决的两个挑战是:(i)提供一种基于预处理的方法,将大数据集分解成可以运行高质量算法的小数据集;(ii)向用户提供关于所产生的共识排名中元素位置的鲁棒性的信息。我们的方法不仅基于数学基础,为计算结果提供保证,而且还在生命科学界可用的实际系统中实现,并在各种实际用例中进行了测试。
{"title":"Reliability-Aware and Graph-Based Approach for Rank Aggregation of Biological Data","authors":"Pierre Andrieu, Bryan Brancotte, L. Bulteau, Sarah Cohen-Boulakia, A. Denise, A. Pierrot, Stéphane Vialette","doi":"10.1109/eScience.2019.00022","DOIUrl":"https://doi.org/10.1109/eScience.2019.00022","url":null,"abstract":"Massive biological datasets are available in public databases and can be queried using portals with keyword queries. Ranked lists of answers are obtained by users. However, properly querying such portals remains difficult since various formulations of the same query can be considered (e.g., using synonyms). Consequently, users have to manually combine several lists of hundreds of answers into one list. Rank aggregation techniques are particularly well-fitted to this context as they take in a set of ranked elements (rankings) and provide a consensus, that is, a single ranking which is the \"closest\" to the input rankings. However, the problem of rank aggregation is NP-hard in most cases. Using an exact algorithm is currently not possible for more than a few dozens of elements. A plethora of heuristics have thus been proposed which behaviour are, by essence, difficult to anticipate: given a set of input rankings, one cannot guarantee how far from an exact solution the consensus ranking provided by an heuristic will be. The two challenges we want to tackle in this paper are the following: (i) providing an approach based on a pre-process to decompose large data sets into smaller ones where high-quality algorithms can be run and (ii) providing information to users on the robustness of the positions of elements in the consensus ranking produced. Our approach not only lies in mathematical bases, offering guarantees on the result computed but it has also been implemented in a real system available to life science community and tested on various real use cases.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126880702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Engagement and Performance Operations Center: EPOC 参与和绩效运营中心:EPOC
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00086
Edward R. Moynihan, J. Schopf, J. Zurawski
In 2018, the US National Science Foundation (NSF) funded the Engagement and Performance Operations Center (EPOC), a joint project between Indiana University (IU) and the Department of Energy's Energy Science Network (ESnet), to work with domain scientists to accelerate the ability of distributed collaborations to share data in order to reach broader science goals. The goal of this funding was to create an operations center for engagement - including definition of formal processes, tracking of engagements, and funded staff, not simply best effort by volunteers, with a goal of enabling digital societies to better share scientific data.
2018年,美国国家科学基金会(NSF)资助了参与和绩效运营中心(EPOC),这是印第安纳大学(IU)和能源部能源科学网络(ESnet)之间的一个联合项目,旨在与领域科学家合作,加快分布式协作共享数据的能力,以实现更广泛的科学目标。这笔资金的目标是创建一个参与的运营中心,包括定义正式流程、跟踪参与情况和资助人员,而不仅仅是志愿者的最大努力,其目标是使数字社会能够更好地共享科学数据。
{"title":"The Engagement and Performance Operations Center: EPOC","authors":"Edward R. Moynihan, J. Schopf, J. Zurawski","doi":"10.1109/eScience.2019.00086","DOIUrl":"https://doi.org/10.1109/eScience.2019.00086","url":null,"abstract":"In 2018, the US National Science Foundation (NSF) funded the Engagement and Performance Operations Center (EPOC), a joint project between Indiana University (IU) and the Department of Energy's Energy Science Network (ESnet), to work with domain scientists to accelerate the ability of distributed collaborations to share data in order to reach broader science goals. The goal of this funding was to create an operations center for engagement - including definition of formal processes, tracking of engagements, and funded staff, not simply best effort by volunteers, with a goal of enabling digital societies to better share scientific data.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123259454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ENVRI-FAIR - Interoperable Environmental FAIR Data and Services for Society, Innovation and Research 为社会、创新和研究提供可互操作的环境公平数据和服务
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00038
A. Petzold, A. Asmi, A. Vermeulen, G. Pappalardo, D. Bailo, D. Schaap, H. Glaves, U. Bundke, Zhiming Zhao
ENVRI-FAIR is a recently launched project of the European Union's Horizon 2020 program (EU H2020), connecting the cluster of European Environmental Research Infrastructures (ENVRI) to the European Open Science Cloud (EOSC). The overarching goal of ENVRI-FAIR is that all participating research infrastructures (RIs) will provide a set of interoperable FAIR data services that enhance the efficiency and productivity of researchers, support innovation, enable data-and knowledge-based decisions and connect the ENVRI cluster to the EOSC. This goal will be reached by: (1) defining community policies and standards across all stages of the data life cycle, aligned with the wider European policies and with international developments; (2) creating for all participating RIs sustainable, transparent and auditable data services for each stage of the data life cycle, following the FAIR principles; (3) implementing prototypes for testing pre-production services at each RI, leading to a catalogue of prepared services; (4) exposing the complete set of thematic data services and tools of the ENVRI cluster to the EOSC catalogue of services.
ENVRI- fair是欧盟地平线2020计划(EU H2020)最近启动的一个项目,将欧洲环境研究基础设施(ENVRI)集群连接到欧洲开放科学云(EOSC)。ENVRI-FAIR的总体目标是,所有参与的研究基础设施(RIs)将提供一套可互操作的FAIR数据服务,以提高研究人员的效率和生产力,支持创新,实现基于数据和知识的决策,并将ENVRI集群连接到EOSC。这一目标将通过以下方式实现:(1)在数据生命周期的所有阶段定义社区政策和标准,与更广泛的欧洲政策和国际发展保持一致;(2)遵循公平原则,为所有参与机构在数据生命周期的每个阶段提供可持续、透明和可审计的数据服务;(3)在每个RI实施用于测试生产前服务的原型,从而形成准备好的服务目录;(4)将ENVRI集群的完整主题数据服务和工具暴露给EOSC服务目录。
{"title":"ENVRI-FAIR - Interoperable Environmental FAIR Data and Services for Society, Innovation and Research","authors":"A. Petzold, A. Asmi, A. Vermeulen, G. Pappalardo, D. Bailo, D. Schaap, H. Glaves, U. Bundke, Zhiming Zhao","doi":"10.1109/eScience.2019.00038","DOIUrl":"https://doi.org/10.1109/eScience.2019.00038","url":null,"abstract":"ENVRI-FAIR is a recently launched project of the European Union's Horizon 2020 program (EU H2020), connecting the cluster of European Environmental Research Infrastructures (ENVRI) to the European Open Science Cloud (EOSC). The overarching goal of ENVRI-FAIR is that all participating research infrastructures (RIs) will provide a set of interoperable FAIR data services that enhance the efficiency and productivity of researchers, support innovation, enable data-and knowledge-based decisions and connect the ENVRI cluster to the EOSC. This goal will be reached by: (1) defining community policies and standards across all stages of the data life cycle, aligned with the wider European policies and with international developments; (2) creating for all participating RIs sustainable, transparent and auditable data services for each stage of the data life cycle, following the FAIR principles; (3) implementing prototypes for testing pre-production services at each RI, leading to a catalogue of prepared services; (4) exposing the complete set of thematic data services and tools of the ENVRI cluster to the EOSC catalogue of services.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126256233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Reproducibility by Other Means: Transparent Research Objects 其他方法的可重复性:透明的研究对象
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00066
T. McPhillips, C. Willis, M. Gryk, Santiago Núñez Corrales, Bertram Ludäscher
Research Objects have the potential to significantly enhance the reproducibility of scientific research. One important way Research Objects can do this is by encapsulating the means for re-executing the computational components of studies, thus supporting the new form of reproducibility enabled by digital computing-exact repeatability. However, Research Objects also can make scientific research more reproducible by supporting transparency, a component of reproducibility orthogonal to re-executability. We describe here our vision for making Research Objects more transparent by providing means for disambiguating claims about reproducibility generally, and computational repeatability specifically. We show how support for science-oriented queries can enable researchers to assess the reproducibility of Research Objects and the individual methods and results they encapsulate.
研究对象具有显著提高科学研究可重复性的潜力。研究对象可以做到这一点的一个重要方法是封装重新执行研究的计算组件的方法,从而支持数字计算实现的新形式的可重复性——精确的可重复性。然而,Research Objects还可以通过支持透明度(与可重复执行性正交的可重复性组成部分)使科学研究更具可重复性。我们在这里描述了我们的愿景,即通过提供消除关于一般可重复性和具体计算可重复性的歧义声明的方法,使研究对象更加透明。我们展示了对面向科学的查询的支持如何使研究人员能够评估研究对象及其封装的单个方法和结果的可重复性。
{"title":"Reproducibility by Other Means: Transparent Research Objects","authors":"T. McPhillips, C. Willis, M. Gryk, Santiago Núñez Corrales, Bertram Ludäscher","doi":"10.1109/eScience.2019.00066","DOIUrl":"https://doi.org/10.1109/eScience.2019.00066","url":null,"abstract":"Research Objects have the potential to significantly enhance the reproducibility of scientific research. One important way Research Objects can do this is by encapsulating the means for re-executing the computational components of studies, thus supporting the new form of reproducibility enabled by digital computing-exact repeatability. However, Research Objects also can make scientific research more reproducible by supporting transparency, a component of reproducibility orthogonal to re-executability. We describe here our vision for making Research Objects more transparent by providing means for disambiguating claims about reproducibility generally, and computational repeatability specifically. We show how support for science-oriented queries can enable researchers to assess the reproducibility of Research Objects and the individual methods and results they encapsulate.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128106299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
ForestEyes Project: Can Citizen Scientists Help Rainforests? ForestEyes项目:公民科学家能帮助雨林吗?
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00010
F. B. J. R. Dallaqua, Á. Fazenda, F. Faria
Scientific projects involving volunteers for analyzing, collecting data, and using their computational resources, known as Citizen Science (CS), have become popular due to advances in information and communication technology (ICT). Many CS projects have been proposed to involve citizens in different knowledge domain such as astronomy, chemistry, mathematics, and physics. This work presents a CS project called ForestEyes, which proposes to track deforestation in rainforests by asking volunteers to analyze and classify remote sensing images. These manually classified data are used as input for training a pattern classifier that will be used to label new remote sensing images. ForestEyes project was created on the Zooniverse.org CS platform, and to attest the quality of the volunteers' answers, were performed early campaigns with remote sensing images from Brazilian Legal Amazon (BLA). The results were processed and compared to an oracle classification (PRODES - Amazon Deforestation Monitoring Project). Two and a half weeks after launch, more than 35,000 answers from 383 volunteers (117 anonymous and 266 registered users) were received, completing all 2050 tasks. The ForestEyes campaigns' results have shown that volunteers achieved excellent effectiveness results in remote sensing image classification task. Furthermore, these results show that CS might be a powerful tool to quickly obtain a large amount of high-quality labeled data.
由于信息和通信技术(ICT)的进步,由志愿者分析、收集数据并使用他们的计算资源的科学项目,被称为“公民科学”(CS),已经变得流行起来。许多计算机科学项目被提议让公民参与不同的知识领域,如天文学、化学、数学和物理学。这项工作提出了一个名为ForestEyes的CS项目,该项目建议通过要求志愿者分析和分类遥感图像来跟踪热带雨林的砍伐情况。这些人工分类的数据被用作训练模式分类器的输入,该模式分类器将用于标记新的遥感图像。ForestEyes项目是在Zooniverse.org CS平台上创建的,为了证明志愿者回答的质量,早期活动使用了巴西合法亚马逊(BLA)的遥感图像。结果被处理并与oracle分类(PRODES -亚马逊森林砍伐监测项目)进行比较。在启动两周半后,383名志愿者(117名匿名用户和266名注册用户)收到了35000多个答案,完成了全部2050个任务。ForestEyes运动的结果表明,志愿者在遥感图像分类任务中取得了优异的有效性效果。此外,这些结果表明,CS可能是快速获得大量高质量标记数据的有力工具。
{"title":"ForestEyes Project: Can Citizen Scientists Help Rainforests?","authors":"F. B. J. R. Dallaqua, Á. Fazenda, F. Faria","doi":"10.1109/eScience.2019.00010","DOIUrl":"https://doi.org/10.1109/eScience.2019.00010","url":null,"abstract":"Scientific projects involving volunteers for analyzing, collecting data, and using their computational resources, known as Citizen Science (CS), have become popular due to advances in information and communication technology (ICT). Many CS projects have been proposed to involve citizens in different knowledge domain such as astronomy, chemistry, mathematics, and physics. This work presents a CS project called ForestEyes, which proposes to track deforestation in rainforests by asking volunteers to analyze and classify remote sensing images. These manually classified data are used as input for training a pattern classifier that will be used to label new remote sensing images. ForestEyes project was created on the Zooniverse.org CS platform, and to attest the quality of the volunteers' answers, were performed early campaigns with remote sensing images from Brazilian Legal Amazon (BLA). The results were processed and compared to an oracle classification (PRODES - Amazon Deforestation Monitoring Project). Two and a half weeks after launch, more than 35,000 answers from 383 volunteers (117 anonymous and 266 registered users) were received, completing all 2050 tasks. The ForestEyes campaigns' results have shown that volunteers achieved excellent effectiveness results in remote sensing image classification task. Furthermore, these results show that CS might be a powerful tool to quickly obtain a large amount of high-quality labeled data.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115510662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Contextual Linking between Workflow Provenance and System Performance Logs 工作流来源和系统性能日志之间的上下文链接
Pub Date : 2019-09-01 DOI: 10.1109/eScience.2019.00093
Elias el Khaldi Ahanach, Spiros Koulouzis, Zhiming Zhao
When executing scientific workflows, anomalies of the workflow behavior are often caused by different issues such as resource failures at the underlying infrastructure. The provenance information collected by workflow management systems only captures the transformation of data at the workflow level. Analyzing provenance information and apposite system metrics requires expertise and manual effort. Moreover, it is often timeconsuming to aggregate this information and correlate events occurring at different levels of the infrastructure. In this paper, we propose an architecture to automate the integration among workflow provenance information and performance information from the infrastructure level. Our architecture enables workflow developers or domain scientists to effectively browse workflow execution information together with the system metrics, and analyze contextual information for possible anomalies.
在执行科学工作流时,工作流行为的异常通常是由不同的问题引起的,例如底层基础结构中的资源故障。工作流管理系统收集的来源信息仅捕获工作流级别的数据转换。分析来源信息和适当的系统度量需要专业知识和手工工作。此外,聚合这些信息并关联发生在基础设施不同级别上的事件通常非常耗时。在本文中,我们提出了一种架构来实现工作流来源信息和性能信息在基础架构层之间的自动化集成。我们的架构使工作流开发人员或领域科学家能够与系统度量一起有效地浏览工作流执行信息,并分析可能的异常的上下文信息。
{"title":"Contextual Linking between Workflow Provenance and System Performance Logs","authors":"Elias el Khaldi Ahanach, Spiros Koulouzis, Zhiming Zhao","doi":"10.1109/eScience.2019.00093","DOIUrl":"https://doi.org/10.1109/eScience.2019.00093","url":null,"abstract":"When executing scientific workflows, anomalies of the workflow behavior are often caused by different issues such as resource failures at the underlying infrastructure. The provenance information collected by workflow management systems only captures the transformation of data at the workflow level. Analyzing provenance information and apposite system metrics requires expertise and manual effort. Moreover, it is often timeconsuming to aggregate this information and correlate events occurring at different levels of the infrastructure. In this paper, we propose an architecture to automate the integration among workflow provenance information and performance information from the infrastructure level. Our architecture enables workflow developers or domain scientists to effectively browse workflow execution information together with the system metrics, and analyze contextual information for possible anomalies.","PeriodicalId":142614,"journal":{"name":"2019 15th International Conference on eScience (eScience)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114195286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
期刊
2019 15th International Conference on eScience (eScience)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1