首页 > 最新文献

Data Science Journal最新文献

英文 中文
Capacity Development and Collaboration for Sustainable African Agriculture: Amplification of Impact Through Hackathons 非洲可持续农业的能力发展与合作:通过黑客马拉松扩大影响
Q2 Computer Science Pub Date : 2021-07-20 DOI: 10.5334/dsj-2021-023
K. Charvát, B. Bye, Hana Kubícková, Foteini Zampati, Tuula Löytty, Kizito Odhiambo, K. Kamau, S. Anand, P. Kasoma, Maximilien Houël, Elias Cherenet, Akaninyene Obot, Felix Kariuki, Antoine Kantiza, Ronald Ssembajwe, Samuel Njogo, W. Kamau
The paper describes the concept of INSPIRE Kampala virtual hackathons, with the main focus to build and strengthen relationships between several European Union (EU) projects and African communities that started in 2019 with the Nairobi INSPIRE Hackathon. The main focus is exploring a new model for capacity building based on virtual hackathons as an excellent opportunity for bringing together people from different work environments, culture and disciplinary backgrounds. This paper is describing experience and lessons learned from the Kampala INSPIRE Hackathon. INSPIRE Hackathons have evolved over a five year period since it started and during this period we developed a model of fully virtual Hackathons, which we recognise as optimal for Africa. The paper describes all stages of Hackathon building: definition of themes, selection of mentors, development, webinars as tools for sharing experience, final presentation, selection of winners and awarding ceremony. As important we consider also planning other actions, because we don’t see INSPIRE Hackathon as an event, but as a continuous process. Demonstration part of paper describes the lessons learnt from the winning challenge: Desert Locus Monitoring. The description of all phases demonstrate Kampala INSPIRE Hackathon approach. On the basis of experience we defined strategy for the future, how to continue and successfully extend such a model in Africa.
本文描述了INSPIRE坎帕拉虚拟黑客马拉松的概念,主要重点是建立和加强几个欧盟项目与非洲社区之间的关系,这些项目始于2019年的内罗毕INSPIRE黑客马拉松。主要重点是探索一种基于虚拟黑客马拉松的能力建设新模式,这是一个将来自不同工作环境、文化和学科背景的人聚集在一起的绝佳机会。本文介绍了坎帕拉INSPIRE黑客马拉松的经验和教训。INSPIRE黑客马拉松自成立以来已经发展了五年多,在此期间,我们开发了一个完全虚拟的黑客马拉松模型,我们认为这是非洲的最佳模式。该论文描述了黑客马拉松建设的各个阶段:主题的定义、导师的选择、发展、作为分享经验工具的网络研讨会、最终演示、获奖者的选择和颁奖典礼。同样重要的是,我们还考虑计划其他行动,因为我们不认为INSPIRE黑客马拉松是一个活动,而是一个持续的过程。论文的示范部分介绍了从获胜挑战中吸取的经验教训:沙漠蝗虫监测。所有阶段的描述展示了坎帕拉INSPIRE黑客马拉松的方法。根据经验,我们确定了未来的战略,如何在非洲继续并成功推广这种模式。
{"title":"Capacity Development and Collaboration for Sustainable African Agriculture: Amplification of Impact Through Hackathons","authors":"K. Charvát, B. Bye, Hana Kubícková, Foteini Zampati, Tuula Löytty, Kizito Odhiambo, K. Kamau, S. Anand, P. Kasoma, Maximilien Houël, Elias Cherenet, Akaninyene Obot, Felix Kariuki, Antoine Kantiza, Ronald Ssembajwe, Samuel Njogo, W. Kamau","doi":"10.5334/dsj-2021-023","DOIUrl":"https://doi.org/10.5334/dsj-2021-023","url":null,"abstract":"The paper describes the concept of INSPIRE Kampala virtual hackathons, with the main focus to build and strengthen relationships between several European Union (EU) projects and African communities that started in 2019 with the Nairobi INSPIRE Hackathon. The main focus is exploring a new model for capacity building based on virtual hackathons as an excellent opportunity for bringing together people from different work environments, culture and disciplinary backgrounds. This paper is describing experience and lessons learned from the Kampala INSPIRE Hackathon. INSPIRE Hackathons have evolved over a five year period since it started and during this period we developed a model of fully virtual Hackathons, which we recognise as optimal for Africa. The paper describes all stages of Hackathon building: definition of themes, selection of mentors, development, webinars as tools for sharing experience, final presentation, selection of winners and awarding ceremony. As important we consider also planning other actions, because we don’t see INSPIRE Hackathon as an event, but as a continuous process. Demonstration part of paper describes the lessons learnt from the winning challenge: Desert Locus Monitoring. The description of all phases demonstrate Kampala INSPIRE Hackathon approach. On the basis of experience we defined strategy for the future, how to continue and successfully extend such a model in Africa.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47985350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SwissEnvEO: A FAIR National Environmental Data Repository for Earth Observation Open Science SwissEnvEO:一个公平的对地观测开放科学国家环境数据库
Q2 Computer Science Pub Date : 2021-05-31 DOI: 10.5334/DSJ-2021-022
G. Giuliani, Hugues Cazeaux, Pierre-Yves Burgi, Charlotte Poussin, Jean-Philippe Richard, B. Chatenoux
Environmental scientific research is highly becoming data-driven and dependent on high performance computing infrastructures to process ever increasing large volume and diverse data sets. Consequently, there is a growing recognition of the need to share data, methods, algorithms, and infrastructure to make scientific research more effective, efficient, open, transparent, reproducible, accessible, and usable by different users. However, Earth Observations (EO) Open Science is still undervalued, and different challenges remains to achieve the vision of transforming EO data into actionable knowledge by lowering the entry barrier to massive-use Big Earth Data analysis and derived information products. Currently, FAIR-compliant digital repositories cannot fully satisfy the needs of EO users, while Spatial Data Infrastructures (SDI) are not fully FAIR-compliant and have difficulties in handling Big Earth Data. In response to these issues and the need to strengthen Open and Reproducible EO science, this paper presents SwissEnvEO, a Spatial Data Infrastructure complemented with digital repository capabilities to facilitate the publication of Ready to Use information products, at national scale, derived from satellite EO data available in an EO Data Cube in full compliance with FAIR principles.
环境科学研究高度依赖于数据驱动和高性能计算基础设施来处理不断增加的大容量和多样化的数据集。因此,人们越来越认识到需要共享数据、方法、算法和基础设施,以使科学研究更加有效、高效、开放、透明、可复制、可访问和可供不同用户使用。然而,地球观测(EO)开放科学仍然被低估,通过降低大规模使用大地球数据分析和衍生信息产品的进入门槛,实现将EO数据转化为可操作知识的愿景仍然存在不同的挑战。目前,符合fair标准的数字存储库不能完全满足EO用户的需求,而空间数据基础设施(SDI)也不完全符合fair标准,在处理大地球数据方面存在困难。为了应对这些问题以及加强开放和可复制的EO科学的需要,本文提出了SwissEnvEO,这是一个空间数据基础设施,与数字存储库功能相补充,以促进在国家范围内发布随时可用的信息产品,这些信息产品来源于EO数据立方体中可获得的卫星EO数据,完全符合FAIR原则。
{"title":"SwissEnvEO: A FAIR National Environmental Data Repository for Earth Observation Open Science","authors":"G. Giuliani, Hugues Cazeaux, Pierre-Yves Burgi, Charlotte Poussin, Jean-Philippe Richard, B. Chatenoux","doi":"10.5334/DSJ-2021-022","DOIUrl":"https://doi.org/10.5334/DSJ-2021-022","url":null,"abstract":"Environmental scientific research is highly becoming data-driven and dependent on high performance computing infrastructures to process ever increasing large volume and diverse data sets. Consequently, there is a growing recognition of the need to share data, methods, algorithms, and infrastructure to make scientific research more effective, efficient, open, transparent, reproducible, accessible, and usable by different users. However, Earth Observations (EO) Open Science is still undervalued, and different challenges remains to achieve the vision of transforming EO data into actionable knowledge by lowering the entry barrier to massive-use Big Earth Data analysis and derived information products. Currently, FAIR-compliant digital repositories cannot fully satisfy the needs of EO users, while Spatial Data Infrastructures (SDI) are not fully FAIR-compliant and have difficulties in handling Big Earth Data. In response to these issues and the need to strengthen Open and Reproducible EO science, this paper presents SwissEnvEO, a Spatial Data Infrastructure complemented with digital repository capabilities to facilitate the publication of Ready to Use information products, at national scale, derived from satellite EO data available in an EO Data Cube in full compliance with FAIR principles.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42884058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Correction: Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data 更正:不在引用,不在考虑:数据引用的实践、政策和技术的现状
Q2 Computer Science Pub Date : 2021-05-25 DOI: 10.5334/DSJ-2021-021
M. Parsons
{"title":"Correction: Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data","authors":"M. Parsons","doi":"10.5334/DSJ-2021-021","DOIUrl":"https://doi.org/10.5334/DSJ-2021-021","url":null,"abstract":"","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46526260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Controlled Vocabulary and Metadata Schema for Materials Science Data Discovery 材料科学数据发现的受控词汇表和元数据模式
Q2 Computer Science Pub Date : 2021-04-29 DOI: 10.5334/DSJ-2021-018
Andrea Medina-Smith, C. Becker, R. Plante, L. Bartolo, A. Dima, J. Warren, R. Hanisch
The International Materials Resource Registries (IMRR) working group of the Research Data Alliance (RDA) was created to spur initial development of a federated registry system to allow for easier discovery and access to materials data. As part of this effort, a controlled vocabulary and metadata schema were developed with contributions from members of the working group and other experts. Here we describe the process, the resulting vocabulary and XML schema, and lessons learned in the development and use of the schema.
研究数据联盟(RDA)的国际材料资源注册(IMRR)工作组的成立是为了促进联合注册系统的初步发展,以便更容易地发现和访问材料数据。作为这项工作的一部分,在工作组成员和其他专家的贡献下,开发了受控词汇表和元数据模式。在这里,我们将描述这个过程、产生的词汇表和XML模式,以及在模式的开发和使用过程中获得的经验教训。
{"title":"A Controlled Vocabulary and Metadata Schema for Materials Science Data Discovery","authors":"Andrea Medina-Smith, C. Becker, R. Plante, L. Bartolo, A. Dima, J. Warren, R. Hanisch","doi":"10.5334/DSJ-2021-018","DOIUrl":"https://doi.org/10.5334/DSJ-2021-018","url":null,"abstract":"The International Materials Resource Registries (IMRR) working group of the Research Data Alliance (RDA) was created to spur initial development of a federated registry system to allow for easier discovery and access to materials data. As part of this effort, a controlled vocabulary and metadata schema were developed with contributions from members of the working group and other experts. Here we describe the process, the resulting vocabulary and XML schema, and lessons learned in the development and use of the schema.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47855339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Improving Discovery and Use of NASA’s Earth Observation Data Through Metadata Quality Assessments 通过元数据质量评估改进NASA地球观测数据的发现和使用
Q2 Computer Science Pub Date : 2021-04-28 DOI: 10.5334/DSJ-2021-017
K. Bugbee, J. le Roux, Adam W. Sisco, A. Kaulfus, Patrick Staton, Camille Woods, V. Dixon, C. Lynnes, R. Ramachandran
High quality descriptive metadata is essential to enabling the effective discovery of Earth observation data to a growing number of diverse users. In this paper, we define a framework to assess the quality of NASA’s Earth observation metadata with the overarching goal of improving the discoverability, accessibility and usability of the data it describes. The framework, developed by the Analysis and Review of the Common Metadata Repository (ARC) team, focuses on the metadata quality dimensions of correctness, completeness, and consistency. The methodology used by the ARC team to implement the framework is described, as well as best practices, lessons learned and recommendations for implementing similar metadata quality assessment processes. Initial results from the project indicate that this methodology, in combination with community and stakeholder collaboration, is effective in improving metadata quality.
高质量的描述性元数据对于向越来越多的不同用户有效发现地球观测数据至关重要。在本文中,我们定义了一个框架来评估NASA地球观测元数据的质量,其总体目标是提高其所描述数据的可发现性、可访问性和可用性。该框架由公共元数据存储库的分析和审查(ARC)团队开发,重点关注正确性、完整性和一致性等元数据质量维度。描述了ARC团队实施该框架所使用的方法,以及实施类似元数据质量评估流程的最佳实践、经验教训和建议。项目的初步结果表明,这种方法结合社区和利益相关者的协作,在提高元数据质量方面是有效的。
{"title":"Improving Discovery and Use of NASA’s Earth Observation Data Through Metadata Quality Assessments","authors":"K. Bugbee, J. le Roux, Adam W. Sisco, A. Kaulfus, Patrick Staton, Camille Woods, V. Dixon, C. Lynnes, R. Ramachandran","doi":"10.5334/DSJ-2021-017","DOIUrl":"https://doi.org/10.5334/DSJ-2021-017","url":null,"abstract":"High quality descriptive metadata is essential to enabling the effective discovery of Earth observation data to a growing number of diverse users. In this paper, we define a framework to assess the quality of NASA’s Earth observation metadata with the overarching goal of improving the discoverability, accessibility and usability of the data it describes. The framework, developed by the Analysis and Review of the Common Metadata Repository (ARC) team, focuses on the metadata quality dimensions of correctness, completeness, and consistency. The methodology used by the ARC team to implement the framework is described, as well as best practices, lessons learned and recommendations for implementing similar metadata quality assessment processes. Initial results from the project indicate that this methodology, in combination with community and stakeholder collaboration, is effective in improving metadata quality.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47738336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A survey of researchers' needs and priorities for data sharing 研究人员对数据共享的需求和优先事项的调查
Q2 Computer Science Pub Date : 2021-02-22 DOI: 10.31219/osf.io/njr5u
I. Hrynaszkiewicz, J. Harney, L. Cadwallader
PLOS has long supported Open Science. One of the ways in which we do so is via our stringent data availability policy established in 2014. Despite this policy, and more data sharing policies being introduced by other organizations, best practices for data sharing are adopted by a minority of researchers in their publications. Problems with effective research data sharing persist and these problems have been quantified by previous research as a lack of time, resources, incentives, and/or skills to share data. In this study we built on this research by investigating the importance of tasks associated with data sharing, and researchers’ satisfaction with their ability to complete these tasks. By investigating these factors we aimed to better understand opportunities for new or improved solutions for sharing data. In May-June 2020 we surveyed researchers from Europe and North America to rate tasks associated with data sharing on (i) their importance and (ii) their satisfaction with their ability to complete them. We received 728 completed and 667 partial responses. We calculated mean importance and satisfaction scores to highlight potential opportunities for new solutions to and compare different cohorts.Tasks relating to research impact, funder compliance, and credit had the highest importance scores. 52% of respondents reuse research data but the average satisfaction score for obtaining data for reuse was relatively low. Tasks associated with sharing data were rated somewhat important and respondents were reasonably well satisfied in their ability to accomplish them. Notably, this included tasks associated with best data sharing practice, such as use of data repositories. However, the most common method for sharing data was in fact via supplemental files with articles, which is not considered to be best practice.We presume that researchers are unlikely to seek new solutions to a problem or task that they are satisfied in their ability to accomplish, even if many do not attempt this task. This implies there are few opportunities for new solutions or tools to meet these researcher needs. Publishers can likely meet these needs for data sharing by working to seamlessly integrate existing solutions that reduce the effort or behaviour change involved in some tasks, and focusing on advocacy and education around the benefits of sharing data. There may however be opportunities - unmet researcher needs - in relation to better supporting data reuse, which could be met in part by strengthening data sharing policies of journals and publishers, and improving the discoverability of data associated with published articles.
PLOS长期以来一直支持开放科学。我们这样做的方式之一是通过2014年制定的严格的数据可用性政策。尽管有这项政策,其他组织也推出了更多的数据共享政策,但少数研究人员在其出版物中采用了数据共享的最佳做法。有效的研究数据共享问题依然存在,之前的研究将这些问题量化为缺乏时间、资源、激励和/或共享数据的技能。在这项研究中,我们通过调查与数据共享相关的任务的重要性,以及研究人员对他们完成这些任务的能力的满意度,建立了这项研究的基础。通过调查这些因素,我们旨在更好地了解共享数据的新解决方案或改进解决方案的机会。2020年5月至6月,我们对来自欧洲和北美的研究人员进行了调查,对与数据共享相关的任务进行了评分,包括(i)它们的重要性和(ii)它们对完成这些任务的能力的满意度。我们收到728份完整回复和667份部分回复。我们计算了平均重要性和满意度得分,以突出新解决方案的潜在机会,并比较不同的队列。与研究影响、资助者合规性和信用相关的任务的重要性得分最高。52%的受访者重复使用研究数据,但获得重复使用数据的平均满意度相对较低。与共享数据相关的任务被认为有些重要,受访者对自己完成这些任务的能力相当满意。值得注意的是,这包括与最佳数据共享实践相关的任务,例如使用数据存储库。然而,共享数据的最常见方法实际上是通过文章的补充文件,这并不被认为是最佳实践。我们认为,研究人员不太可能为他们对自己的能力感到满意的问题或任务寻求新的解决方案,即使许多人没有尝试这项任务。这意味着新的解决方案或工具几乎没有机会满足这些研究人员的需求。出版商可以通过无缝集成现有解决方案来满足数据共享的这些需求,这些解决方案可以减少某些任务所涉及的工作量或行为变化,并专注于围绕共享数据的好处进行宣传和教育。然而,在更好地支持数据重用方面,可能存在机会——未满足的研究人员需求——这可以通过加强期刊和出版商的数据共享政策以及提高与已发表文章相关的数据的可发现性来部分满足。
{"title":"A survey of researchers' needs and priorities for data sharing","authors":"I. Hrynaszkiewicz, J. Harney, L. Cadwallader","doi":"10.31219/osf.io/njr5u","DOIUrl":"https://doi.org/10.31219/osf.io/njr5u","url":null,"abstract":"PLOS has long supported Open Science. One of the ways in which we do so is via our stringent data availability policy established in 2014. Despite this policy, and more data sharing policies being introduced by other organizations, best practices for data sharing are adopted by a minority of researchers in their publications. Problems with effective research data sharing persist and these problems have been quantified by previous research as a lack of time, resources, incentives, and/or skills to share data. In this study we built on this research by investigating the importance of tasks associated with data sharing, and researchers’ satisfaction with their ability to complete these tasks. By investigating these factors we aimed to better understand opportunities for new or improved solutions for sharing data. In May-June 2020 we surveyed researchers from Europe and North America to rate tasks associated with data sharing on (i) their importance and (ii) their satisfaction with their ability to complete them. We received 728 completed and 667 partial responses. We calculated mean importance and satisfaction scores to highlight potential opportunities for new solutions to and compare different cohorts.Tasks relating to research impact, funder compliance, and credit had the highest importance scores. 52% of respondents reuse research data but the average satisfaction score for obtaining data for reuse was relatively low. Tasks associated with sharing data were rated somewhat important and respondents were reasonably well satisfied in their ability to accomplish them. Notably, this included tasks associated with best data sharing practice, such as use of data repositories. However, the most common method for sharing data was in fact via supplemental files with articles, which is not considered to be best practice.We presume that researchers are unlikely to seek new solutions to a problem or task that they are satisfied in their ability to accomplish, even if many do not attempt this task. This implies there are few opportunities for new solutions or tools to meet these researcher needs. Publishers can likely meet these needs for data sharing by working to seamlessly integrate existing solutions that reduce the effort or behaviour change involved in some tasks, and focusing on advocacy and education around the benefits of sharing data. There may however be opportunities - unmet researcher needs - in relation to better supporting data reuse, which could be met in part by strengthening data sharing policies of journals and publishers, and improving the discoverability of data associated with published articles.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45088251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Implementing a Registry Federation for Materials Science Data Discovery. 材料科学数据发现注册联盟的实现。
Q2 Computer Science Pub Date : 2021-01-01 DOI: 10.5334/dsj-2021-015
Raymond L Plante, Chandler A Becker, Andrea Medina-Smith, Kevin Brady, Alden Dima, Benjamin Long, Laura M Bartolo, James A Warren, Robert J Hanisch

As a result of a number of national initiatives, we are seeing rapid growth in the data important to materials science that are available over the web. Consequently, it is becoming increasingly difficult for researchers to learn what data are available and how to access them. To address this problem, the Research Data Alliance (RDA) Working Group for International Materials Science Registries (IMRR) was established to bring together materials science and information technology experts to develop an international federation of registries that can be used for global discovery of data resources for materials science. A resource registry collects high-level metadata descriptions of resources such as data repositories, archives, websites, and services that are useful for data-driven research. By making the collection searchable, it aids scientists in industry, universities, and government laboratories to discover data relevant to their research and work interests. We present the results of our successful piloting of a registry federation for materials science data discovery. In particular, we out a blueprint for creating such a federation that is capable of amassing a global view of all available materials science data, and we enumerate the requirements for the standards that make the registries interoperable within the federation. These standards include a protocol for exchanging resource descriptions and a standard metadata schema for encoding those descriptions. We summarize how we leveraged an existing standard (OAI-PMH) for metadata exchange. Finally, we review the registry software developed to realize the federation and describe the user experience.

由于一系列国家举措,我们看到网络上对材料科学重要的数据迅速增长。因此,研究人员越来越难以了解哪些数据可用以及如何访问这些数据。为了解决这个问题,成立了国际材料科学登记处研究数据联盟工作组,将材料科学和信息技术专家聚集在一起,建立一个可用于全球发现材料科学数据资源的国际登记处联合会。资源注册表收集对数据驱动研究有用的资源的高级元数据描述,如数据存储库、档案、网站和服务。通过使数据集可搜索,它帮助工业界、大学和政府实验室的科学家发现与他们的研究和工作兴趣相关的数据。我们介绍了材料科学数据发现注册联合会的成功试点结果。特别是,我们制定了创建这样一个联合会的蓝图,该联合会能够收集所有可用材料科学数据的全球视图,并列举了使注册中心在联合会内可互操作的标准要求。这些标准包括用于交换资源描述的协议和用于对这些描述进行编码的标准元数据模式。我们总结了如何利用现有标准(OAI-PMH)进行元数据交换。最后,我们回顾了为实现联邦而开发的注册表软件,并描述了用户体验。
{"title":"Implementing a Registry Federation for Materials Science Data Discovery.","authors":"Raymond L Plante,&nbsp;Chandler A Becker,&nbsp;Andrea Medina-Smith,&nbsp;Kevin Brady,&nbsp;Alden Dima,&nbsp;Benjamin Long,&nbsp;Laura M Bartolo,&nbsp;James A Warren,&nbsp;Robert J Hanisch","doi":"10.5334/dsj-2021-015","DOIUrl":"10.5334/dsj-2021-015","url":null,"abstract":"<p><p>As a result of a number of national initiatives, we are seeing rapid growth in the data important to materials science that are available over the web. Consequently, it is becoming increasingly difficult for researchers to learn what data are available and how to access them. To address this problem, the Research Data Alliance (RDA) Working Group for International Materials Science Registries (IMRR) was established to bring together materials science and information technology experts to develop an international federation of registries that can be used for global discovery of data resources for materials science. A resource registry collects high-level metadata descriptions of resources such as data repositories, archives, websites, and services that are useful for data-driven research. By making the collection searchable, it aids scientists in industry, universities, and government laboratories to discover data relevant to their research and work interests. We present the results of our successful piloting of a registry federation for materials science data discovery. In particular, we out a blueprint for creating such a federation that is capable of amassing a global view of all available materials science data, and we enumerate the requirements for the standards that make the registries interoperable within the federation. These standards include a protocol for exchanging resource descriptions and a standard metadata schema for encoding those descriptions. We summarize how we leveraged an existing standard (OAI-PMH) for metadata exchange. Finally, we review the registry software developed to realize the federation and describe the user experience.</p>","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8596377/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39747338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Public Health Benefits and Ethical Aspects in the Collection and Open Sharing of Wastewater-Based Epidemic Data on COVID-19 收集和公开共享基于废水的COVID-19流行病数据的公共卫生利益和伦理问题
Q2 Computer Science Pub Date : 2021-01-01 DOI: 10.5334/dsj-2021-027
R. Honda, M. Murakami, A. Hata, M. Ihara
Collection and open sharing of wastewater-based epidemic data potentially provide immense public health benefits during outbreak of infectious diseases such as COVID-19. By early detection and localization of unidentified infections, wastewater surveillance is expected to enable early and targeted containment of the local outbreak. Wastewater surveillance renders potentially high public health benefits when a small catchment is targeted;however, it possibly leads to stigmatization and discrimination against the targeted group. Therefore, public commitment is crucial for the collection and open sharing of wastewater-based epidemic data. With respect to the sharing of wastewater-based epidemic data, technical limitations and uncertainty of collected data also should be simultaneously shared on the basis of scientific communication. Useful application of wastewater-based epidemic data is to complement clinical epidemic data, which is possibly biased and overlooks unidentified infections. To acquire public commitment toward the collection and open sharing of wastewater-based epidemic data, stakeholders need to reach a consensus on possible options of restrictive measures taken with respect to the collected data as well as appropriate handling of the collected data to prevent stigmatization and discrimination. © 2021 The Author(s).
在COVID-19等传染病爆发期间,收集和公开共享基于废水的流行病数据可能会带来巨大的公共卫生效益。通过早期发现和定位不明感染,预计废水监测将能够及早和有针对性地遏制当地疫情。当以小集水区为目标时,废水监测可能带来很高的公共卫生效益;然而,它可能导致对目标群体的污名化和歧视。因此,公众承诺对于收集和公开分享基于废水的流行病数据至关重要。关于共享基于废水的流行病数据,也应在科学交流的基础上同时共享所收集数据的技术限制和不确定性。基于废水的流行病数据的有用应用是补充临床流行病数据,这些数据可能有偏见,并且忽略了不明感染。为了获得公众对收集和公开分享基于废水的流行病数据的承诺,利益攸关方需要就对收集到的数据采取限制性措施的可能选择以及对收集到的数据的适当处理达成共识,以防止污名化和歧视。©2021作者。
{"title":"Public Health Benefits and Ethical Aspects in the Collection and Open Sharing of Wastewater-Based Epidemic Data on COVID-19","authors":"R. Honda, M. Murakami, A. Hata, M. Ihara","doi":"10.5334/dsj-2021-027","DOIUrl":"https://doi.org/10.5334/dsj-2021-027","url":null,"abstract":"Collection and open sharing of wastewater-based epidemic data potentially provide immense public health benefits during outbreak of infectious diseases such as COVID-19. By early detection and localization of unidentified infections, wastewater surveillance is expected to enable early and targeted containment of the local outbreak. Wastewater surveillance renders potentially high public health benefits when a small catchment is targeted;however, it possibly leads to stigmatization and discrimination against the targeted group. Therefore, public commitment is crucial for the collection and open sharing of wastewater-based epidemic data. With respect to the sharing of wastewater-based epidemic data, technical limitations and uncertainty of collected data also should be simultaneously shared on the basis of scientific communication. Useful application of wastewater-based epidemic data is to complement clinical epidemic data, which is possibly biased and overlooks unidentified infections. To acquire public commitment toward the collection and open sharing of wastewater-based epidemic data, stakeholders need to reach a consensus on possible options of restrictive measures taken with respect to the collected data as well as appropriate handling of the collected data to prevent stigmatization and discrimination. © 2021 The Author(s).","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71068218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
We Can Make a Better Use of ORCID: Five Observed Misapplications 我们可以更好地利用ORCID:五个观察到的错误应用
Q2 Computer Science Pub Date : 2021-01-01 DOI: 10.5334/dsj-2021-038
Miriam Baglioni, P. Manghi, A. Mannocci, A. Bardi
Since 2012, the “Open Researcher and Contributor ID” organisation (ORCID) has been successfully running a worldwide registry, with the aim of “providing a unique, persistent identifier for individuals to use as they engage in research, scholarship, and innovation activities”. Any service in the scholarly communication ecosystem (e.g., publishers, repositories, CRIS systems, etc.) can contribute to a non-ambiguous scholarly record by including, during metadata deposition, referrals to iDs in the ORCID registry. The OpenAIRE Research Graph is a scholarly knowledge graph that aggregates both records from the ORCID registry and publication records with ORCID referrals from publishers and repositories worldwide to yield research impact monitoring and Open Science statistics. Graph data analytics revealed “anomalies” due to ORCID registry “misapplications”, caused by wrong ORCID referrals and misexploitation of the ORCID registry. Albeit these affect just a minority of ORCID records, they inevitably affect the quality of the ORCID infrastructure and may fuel the rise of detractors and scepticism about the service. In this paper, we classify and qualitatively document such misapplications, identifying five ORCID registrant-related and ORCID referral-related anomalies to raise awareness among ORCID users. We describe the current countermeasures taken by ORCID and, where applicable, provide recommendations. Finally, we elaborate on the importance of a community-steered Open Science infrastructure and the benefits this approach has brought and may bring to ORCID. *Author affiliations can be found in the back matter of this article 2 Baglioni et al. Data Science Journal DOI: 10.5334/dsj-2021038
自2012年以来,“开放研究人员和贡献者ID”组织(ORCID)已经成功地运行了一个全球注册表,其目的是“为从事研究、奖学金和创新活动的个人提供一个唯一的、持久的标识符”。学术交流生态系统中的任何服务(例如,出版商、存储库、CRIS系统等)都可以通过在元数据存储过程中包含对ORCID注册表中id的引用来贡献一个无歧义的学术记录。OpenAIRE研究图谱是一个学术知识图谱,汇集了来自ORCID注册表的记录和来自世界各地出版商和存储库的ORCID引用的出版记录,以产生研究影响监测和开放科学统计数据。图数据分析揭示了由于ORCID注册表“误用”造成的“异常”,这是由错误的ORCID转介和对ORCID注册表的误用造成的。尽管这些问题只影响到ORCID的一小部分记录,但它们不可避免地影响了ORCID基础设施的质量,并可能引发对该服务的诋毁和怀疑。在本文中,我们对这些错误应用进行了分类和定性记录,确定了五个ORCID注册者相关和ORCID转介相关的异常,以提高ORCID用户的意识。我们描述了ORCID目前采取的对策,并在适用的情况下提供建议。最后,我们详细阐述了社区引导的开放科学基础设施的重要性,以及这种方法已经和可能给ORCID带来的好处。2 Baglioni等人在本文的后页可以找到作者的从属关系。数据科学杂志DOI: 10.5334/dsj-2021038
{"title":"We Can Make a Better Use of ORCID: Five Observed Misapplications","authors":"Miriam Baglioni, P. Manghi, A. Mannocci, A. Bardi","doi":"10.5334/dsj-2021-038","DOIUrl":"https://doi.org/10.5334/dsj-2021-038","url":null,"abstract":"Since 2012, the “Open Researcher and Contributor ID” organisation (ORCID) has been successfully running a worldwide registry, with the aim of “providing a unique, persistent identifier for individuals to use as they engage in research, scholarship, and innovation activities”. Any service in the scholarly communication ecosystem (e.g., publishers, repositories, CRIS systems, etc.) can contribute to a non-ambiguous scholarly record by including, during metadata deposition, referrals to iDs in the ORCID registry. The OpenAIRE Research Graph is a scholarly knowledge graph that aggregates both records from the ORCID registry and publication records with ORCID referrals from publishers and repositories worldwide to yield research impact monitoring and Open Science statistics. Graph data analytics revealed “anomalies” due to ORCID registry “misapplications”, caused by wrong ORCID referrals and misexploitation of the ORCID registry. Albeit these affect just a minority of ORCID records, they inevitably affect the quality of the ORCID infrastructure and may fuel the rise of detractors and scepticism about the service. In this paper, we classify and qualitatively document such misapplications, identifying five ORCID registrant-related and ORCID referral-related anomalies to raise awareness among ORCID users. We describe the current countermeasures taken by ORCID and, where applicable, provide recommendations. Finally, we elaborate on the importance of a community-steered Open Science infrastructure and the benefits this approach has brought and may bring to ORCID. *Author affiliations can be found in the back matter of this article 2 Baglioni et al. Data Science Journal DOI: 10.5334/dsj-2021038","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71068452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Towards Globally Unique Identification of Physical Samples: Governance and Technical Implementation of the IGSN Global Sample Number 实现物理样本的全球唯一标识:IGSN全球样本编号的治理和技术实施
Q2 Computer Science Pub Date : 2021-01-01 DOI: 10.5334/dsj-2021-033
J. Klump, K. Lehnert, D. Ulbricht, A. Devaraju, K. Elger, D. Fleischer, S. Ramdeen, L. Wyborn
Persistent unique identifiers (PID) are a critical element in digital research data infrastructure to unambiguously identify, locate, and cite digital representations of a growing range of entities – publications, data, instruments, organizations, funding awards, field programs, and others. The IGSN was developed as the International Geo Sample Number to provide a persistent, globally unique, web resolvable identifier for physical samples. IGSN is both a governance and technical system for assigning globally unique persistent identifiers to physical samples. Even though initially developed for samples in the geosciences, the application of IGSN can be and has already been expanded to other domains that rely on physical samples and collections. This paper describes the current architecture and technical implementation of IGSN, how IGSN relates to other sample identifiers, and how its technical systems are supported by an international governance structure.
持久唯一标识符(PID)是数字研究数据基础设施中的一个关键元素,用于明确识别、定位和引用越来越多实体的数字表示,包括出版物、数据、仪器、组织、基金奖励、实地项目等。IGSN是作为国际地理样本编号开发的,为物理样本提供一个持久的、全球唯一的、网络可解析的标识符。IGSN既是一个治理系统,也是一个技术系统,用于为物理样本分配全局唯一的持久标识符。尽管最初是为地球科学中的样本开发的,但IGSN的应用可以并且已经扩展到依赖物理样本和收集的其他领域。本文描述了IGSN的当前体系结构和技术实现,IGSN如何与其他示例标识符相关,以及其技术系统如何得到国际治理结构的支持。
{"title":"Towards Globally Unique Identification of Physical Samples: Governance and Technical Implementation of the IGSN Global Sample Number","authors":"J. Klump, K. Lehnert, D. Ulbricht, A. Devaraju, K. Elger, D. Fleischer, S. Ramdeen, L. Wyborn","doi":"10.5334/dsj-2021-033","DOIUrl":"https://doi.org/10.5334/dsj-2021-033","url":null,"abstract":"Persistent unique identifiers (PID) are a critical element in digital research data infrastructure to unambiguously identify, locate, and cite digital representations of a growing range of entities – publications, data, instruments, organizations, funding awards, field programs, and others. The IGSN was developed as the International Geo Sample Number to provide a persistent, globally unique, web resolvable identifier for physical samples. IGSN is both a governance and technical system for assigning globally unique persistent identifiers to physical samples. Even though initially developed for samples in the geosciences, the application of IGSN can be and has already been expanded to other domains that rely on physical samples and collections. This paper describes the current architecture and technical implementation of IGSN, how IGSN relates to other sample identifiers, and how its technical systems are supported by an international governance structure.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71068276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
Data Science Journal
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1