Sylvia Melzer, Stefan Thiemann, Simon Schiff, Ralf Möller
At universities, research data is increasingly stored in research data repositories according to a data management plan (DMP) and thus made available for further use. The challenge of reusing hundreds, thousands, or millions of data sets is to obtain an overview of the data in a short period of time and to search through all the data. The high variability of the formats used to store research data requires a new approach to data reusability that focuses on the visualisation and searchability of archived research data, which can also be combined with each other. In this article, we present a practical DMP that describes how information systems can be created on demand by reusing research data archived in research data repositories and how these systems can be merged into a federated information system. As a result, in our projects, information systems have been created in minutes or a couple of hours with few resources. The initial effort to create a federated system remains; however, this allows federated searches to be performed. Extending a federated system to include other information systems can then be accomplished by making a few configurations and manageable adjustments to the source code.
{"title":"Implementation of a Federated Information System by Means of Reuse of Research Data Archived in Research Data Repositories","authors":"Sylvia Melzer, Stefan Thiemann, Simon Schiff, Ralf Möller","doi":"10.5334/dsj-2023-039","DOIUrl":"https://doi.org/10.5334/dsj-2023-039","url":null,"abstract":"At universities, research data is increasingly stored in research data repositories according to a data management plan (DMP) and thus made available for further use. The challenge of reusing hundreds, thousands, or millions of data sets is to obtain an overview of the data in a short period of time and to search through all the data. The high variability of the formats used to store research data requires a new approach to data reusability that focuses on the visualisation and searchability of archived research data, which can also be combined with each other. In this article, we present a practical DMP that describes how information systems can be created on demand by reusing research data archived in research data repositories and how these systems can be merged into a federated information system. As a result, in our projects, information systems have been created in minutes or a couple of hours with few resources. The initial effort to create a federated system remains; however, this allows federated searches to be performed. Extending a federated system to include other information systems can then be accomplished by making a few configurations and manageable adjustments to the source code.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136303138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fabio Cignini, Enrico Cosimi, Vittoria Cozza, Flavio Fontana, Maurizio Matera, Giangiacomo Ponzo, Maria Salvato, Veronica Tomassetti
The Covenant of Mayors promotes the Sustainable Energy Action Plan (SEAP), aiming to mitigate greenhouse gas (GHG) emissions in line with the European Union’s 2030 and 2050 targets. The Covenant signatories could take enormous advantage from a digital platform that allows SEAP drafting also to no technically skilled users, like majority of them are. The Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA) has developed the PAES platform in order to provide digital support to public administrations (PA) adhering to the Covenant of Mayors. The platform exploits open data and it is fed by energetic data aggregated on a municipal level. The platform offers appropriate functionalities for baseline CO2 emissions inventory (BEI) filling out and a best practice (BP) simulation tool. The latter allows to contextualize each BP and to estimate its effects in terms of the main GHG emission. The BP showing the best estimation results can then be converted into concrete adaptation actions. So, this digital system facilitates local Italian municipalities in the strategic planning and monitoring of adaptation actions taken over time.
{"title":"ENEA PAES: A Web Platform for Supporting Italian Municipalities in Sustainable Energy Action Plan","authors":"Fabio Cignini, Enrico Cosimi, Vittoria Cozza, Flavio Fontana, Maurizio Matera, Giangiacomo Ponzo, Maria Salvato, Veronica Tomassetti","doi":"10.5334/dsj-2023-037","DOIUrl":"https://doi.org/10.5334/dsj-2023-037","url":null,"abstract":"The Covenant of Mayors promotes the Sustainable Energy Action Plan (SEAP), aiming to mitigate greenhouse gas (GHG) emissions in line with the European Union’s 2030 and 2050 targets. The Covenant signatories could take enormous advantage from a digital platform that allows SEAP drafting also to no technically skilled users, like majority of them are. The Italian National Agency for New Technologies, Energy and Sustainable Economic Development (ENEA) has developed the PAES platform in order to provide digital support to public administrations (PA) adhering to the Covenant of Mayors. The platform exploits open data and it is fed by energetic data aggregated on a municipal level. The platform offers appropriate functionalities for baseline CO2 emissions inventory (BEI) filling out and a best practice (BP) simulation tool. The latter allows to contextualize each BP and to estimate its effects in terms of the main GHG emission. The BP showing the best estimation results can then be converted into concrete adaptation actions. So, this digital system facilitates local Italian municipalities in the strategic planning and monitoring of adaptation actions taken over time.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135845686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Praetzellis, M. Buys, Xiaoli Chen, J. Chodacki, N. Davies, Kristian Garza, Catherine Nancarrow, Brian Riley, E. Robinson
{"title":"A Programmatic and Scalable Approach to Making Data Management Machine-Actionable","authors":"M. Praetzellis, M. Buys, Xiaoli Chen, J. Chodacki, N. Davies, Kristian Garza, Catherine Nancarrow, Brian Riley, E. Robinson","doi":"10.5334/dsj-2023-026","DOIUrl":"https://doi.org/10.5334/dsj-2023-026","url":null,"abstract":"","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71068556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data Science has the potential to provide humanity with critical insight into the massive data being collected during a pandemic. The COVID-19 pandemic presented that opportunity, and Data Science supported an international audience promptly, reliably, effectively, and frequently during that difficult time. The most significant contributions were data visualizations and data dashboards, however, other tools, such as predictive and prescriptive analytics, were equally critical to the effort. The urgency at the start of the pandemic was to quickly communicate information to citizens, governments, and institutions. The change in modality from traditional statistical metrics and tables to data visualizations was extremely significant and helpful to so many. This paper reviews these contributions by demonstrating how the COVID-19 story unfolded through author-generated data visualizations and dashboards, and by providing the community with open-source access to the scripts that generated these visualizations. The open-source access to the (R language) scripts reflects this article’s novelty in the literature. Using publicly available datasets from multiple sources, and employing R toolkits, the author validates the role that Data Science can play in a pandemic, and that can be implemented by anyone with some basic knowledge of scripting languages, like R. The intent is to provide these valuable tools to the community and to demonstrate their effectiveness in the likely event when there is another crisis.
{"title":"Data Science in a Pandemic","authors":"Dennis F. X. Mathaisel","doi":"10.5334/dsj-2023-041","DOIUrl":"https://doi.org/10.5334/dsj-2023-041","url":null,"abstract":"Data Science has the potential to provide humanity with critical insight into the massive data being collected during a pandemic. The COVID-19 pandemic presented that opportunity, and Data Science supported an international audience promptly, reliably, effectively, and frequently during that difficult time. The most significant contributions were data visualizations and data dashboards, however, other tools, such as predictive and prescriptive analytics, were equally critical to the effort. The urgency at the start of the pandemic was to quickly communicate information to citizens, governments, and institutions. The change in modality from traditional statistical metrics and tables to data visualizations was extremely significant and helpful to so many. This paper reviews these contributions by demonstrating how the COVID-19 story unfolded through author-generated data visualizations and dashboards, and by providing the community with open-source access to the scripts that generated these visualizations. The open-source access to the (R language) scripts reflects this article’s novelty in the literature. Using publicly available datasets from multiple sources, and employing R toolkits, the author validates the role that Data Science can play in a pandemic, and that can be implemented by anyone with some basic knowledge of scripting languages, like R. The intent is to provide these valuable tools to the community and to demonstrate their effectiveness in the likely event when there is another crisis.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135261934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tomasz Miksa, M. Suchánek, Jan Slifka, Vojtěch Knaisl, F. Ekaputra, Filip Kovacevic, Annisa Maulida Ningtyas, Alaa El-Ebshihy, R. Pergl
{"title":"Towards a Toolbox for Automated Assessment of Machine-Actionable Data Management Plans","authors":"Tomasz Miksa, M. Suchánek, Jan Slifka, Vojtěch Knaisl, F. Ekaputra, Filip Kovacevic, Annisa Maulida Ningtyas, Alaa El-Ebshihy, R. Pergl","doi":"10.5334/dsj-2023-028","DOIUrl":"https://doi.org/10.5334/dsj-2023-028","url":null,"abstract":"","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71068230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Bishop, Peter Neish, Ji Hyun Kim, Raphaëlle Bats, Anthony J. Million, Jake Carlson, Heather Moulaison Sandy, Minh T. Pham
{"title":"Data Management Plan Implementation, Assessments, and Evaluations: Implications and Recommendations","authors":"B. Bishop, Peter Neish, Ji Hyun Kim, Raphaëlle Bats, Anthony J. Million, Jake Carlson, Heather Moulaison Sandy, Minh T. Pham","doi":"10.5334/dsj-2023-027","DOIUrl":"https://doi.org/10.5334/dsj-2023-027","url":null,"abstract":"","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71068161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing Informatics Tools with Data Management Plans for Disease Area Research","authors":"V. Navale, Matthew McAuliffe","doi":"10.5334/dsj-2023-024","DOIUrl":"https://doi.org/10.5334/dsj-2023-024","url":null,"abstract":"","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71068504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christine R. Kirkpatrick, Kevin Coakley, Julianne Christopher, Inês Dutra
Seven years after the seminal paper on FAIR was published, that introduced the concept of making research outputs Findable, Accessible, Interoperable, and Reusable, researchers still struggle to understand how to implement the principles. For many researchers, FAIR promises long-term benefits for near-term effort, requires skills not yet acquired, and is one more thing in a long list of unfunded mandates and onerous requirements for scientists. Even for those required to, or who are convinced that they must make time for FAIR research practices, their preference is for just-in-time advice properly sized to the scientific artifacts and process. Because of the generality of most FAIR implementation guidance, it is difficult for a researcher to adjust to the advice according to their situation. Technological advances, especially in the area of artificial intelligence (AI) and machine learning (ML), complicate FAIR adoption, as researchers and data stewards ponder how to make software, workflows, and models FAIR and reproducible. The FAIR+ Implementation Survey Tool (FAIRIST) mitigates the problem by integrating research requirements with research proposals in a systematic way. FAIRIST factors in new scholarly outputs, such as nanopublications and notebooks, and the various research artifacts related to AI research (data, models, workflows, and benchmarks). Researchers step through a self-serve survey process and receive a table ready for use in their data management plan (DMP) and/or work plan. while gaining awareness of the FAIR Principles and Open Science concepts. FAIRIST is a model that uses part of the proposal process as a way to do outreach, raise awareness of FAIR dimensions and considerations, while providing timely assistance for competitive proposals.
{"title":"Engaging with Researchers and Raising Awareness of FAIR and Open Science through the FAIR+ Implementation Survey Tool (FAIRIST)","authors":"Christine R. Kirkpatrick, Kevin Coakley, Julianne Christopher, Inês Dutra","doi":"10.5334/dsj-2023-032","DOIUrl":"https://doi.org/10.5334/dsj-2023-032","url":null,"abstract":"Seven years after the seminal paper on FAIR was published, that introduced the concept of making research outputs Findable, Accessible, Interoperable, and Reusable, researchers still struggle to understand how to implement the principles. For many researchers, FAIR promises long-term benefits for near-term effort, requires skills not yet acquired, and is one more thing in a long list of unfunded mandates and onerous requirements for scientists. Even for those required to, or who are convinced that they must make time for FAIR research practices, their preference is for just-in-time advice properly sized to the scientific artifacts and process. Because of the generality of most FAIR implementation guidance, it is difficult for a researcher to adjust to the advice according to their situation. Technological advances, especially in the area of artificial intelligence (AI) and machine learning (ML), complicate FAIR adoption, as researchers and data stewards ponder how to make software, workflows, and models FAIR and reproducible. The FAIR+ Implementation Survey Tool (FAIRIST) mitigates the problem by integrating research requirements with research proposals in a systematic way. FAIRIST factors in new scholarly outputs, such as nanopublications and notebooks, and the various research artifacts related to AI research (data, models, workflows, and benchmarks). Researchers step through a self-serve survey process and receive a table ready for use in their data management plan (DMP) and/or work plan. while gaining awareness of the FAIR Principles and Open Science concepts. FAIRIST is a model that uses part of the proposal process as a way to do outreach, raise awareness of FAIR dimensions and considerations, while providing timely assistance for competitive proposals.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134988993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Swati Gehlot, Karsten Peters-von Gehlen, Andrea Lammert, Hannes Thiemann
PalMod-II was a multi-institutional research project in Germany focusing on enabling and performing global numerical climate simulations with state-of-theart coupled Earth System Models spanning a full glacial cycle from 130 000 years in the past to the present and beyond. The main project goal was the dataset resulting from these simulations and making it available for reuse by the climate science community in-line with the FAIR data principles. In this paper, we present the research data management (RDM) approach developed and employed in PalMod-II to progress towards that project goal. The RDM approach was implemented by RDM professionals specifically funded by PalMod-II, which made it possible to provide RDM services tailored specifically to the project needs. The compilation and maintenance of a project-wide data management plan (DMP) has proven essential for keeping the project on track and serving as a central focal point of any data-related aspects. These include the specification of data responsible scientists, allocation of storage and computaional resources on a high-performance computing system, documentation of simulation output requirements, definition of data standardisation, and publication workflows in-line with the FAIR data principles. Since the RDM approach executed in PalMod-II was first-of-its-kind for all project partners, exhaustive communication at par with the scientists was required to create trust and a collaborative atmosphere within the project. Finally, the RDM approach implemented in PalMod-II facilitated the publication of a flagship dataset for global reuse, and will also be implemented in the follow-up project: PalMod-III.
{"title":"Data Management for PalMod-II – A FAIR-Based Strategy for Data Handling in Large Climate Modeling Projects","authors":"Swati Gehlot, Karsten Peters-von Gehlen, Andrea Lammert, Hannes Thiemann","doi":"10.5334/dsj-2023-034","DOIUrl":"https://doi.org/10.5334/dsj-2023-034","url":null,"abstract":"PalMod-II was a multi-institutional research project in Germany focusing on enabling and performing global numerical climate simulations with state-of-theart coupled Earth System Models spanning a full glacial cycle from 130 000 years in the past to the present and beyond. The main project goal was the dataset resulting from these simulations and making it available for reuse by the climate science community in-line with the FAIR data principles. In this paper, we present the research data management (RDM) approach developed and employed in PalMod-II to progress towards that project goal. The RDM approach was implemented by RDM professionals specifically funded by PalMod-II, which made it possible to provide RDM services tailored specifically to the project needs. The compilation and maintenance of a project-wide data management plan (DMP) has proven essential for keeping the project on track and serving as a central focal point of any data-related aspects. These include the specification of data responsible scientists, allocation of storage and computaional resources on a high-performance computing system, documentation of simulation output requirements, definition of data standardisation, and publication workflows in-line with the FAIR data principles. Since the RDM approach executed in PalMod-II was first-of-its-kind for all project partners, exhaustive communication at par with the scientists was required to create trust and a collaborative atmosphere within the project. Finally, the RDM approach implemented in PalMod-II facilitated the publication of a flagship dataset for global reuse, and will also be implemented in the follow-up project: PalMod-III.","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135360692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marjolaine Bodin, Fredrik Bolmsten, Petra Aulin, T. Ivănoaica, A. Olivo, J. Malka, K. Wrona, Andy Götz
{"title":"Data Management Plans for the Photon and Neutron Communities","authors":"Marjolaine Bodin, Fredrik Bolmsten, Petra Aulin, T. Ivănoaica, A. Olivo, J. Malka, K. Wrona, Andy Götz","doi":"10.5334/dsj-2023-030","DOIUrl":"https://doi.org/10.5334/dsj-2023-030","url":null,"abstract":"","PeriodicalId":35375,"journal":{"name":"Data Science Journal","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71068311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}