This paper provides a systematic analysis of publications that discuss data curation in interdisciplinary and highly collaborative research (IHCR). Using content analysis methodology, it examined 159 publications and identified patterns in definitions of interdisciplinarity, projects’ participants and methodologies, and approaches to data curation. The findings suggest that data is a prominent component in interdisciplinarity. In addition to crossing disciplinary and other boundaries, IHCR is defined as curating and integrating heterogeneous data and creating new forms of knowledge from it. Using personal experiences and descriptive approaches, the publications discussed challenges that data curation in IHCR faces, including an increased overhead in coordination and management, lack of consistent metadata practices, and custom infrastructure that makes interoperability across projects, domains, and repositories difficult. The paper concludes with suggestions for future research.
{"title":"Data Curation in Interdisciplinary and Highly Collaborative Research","authors":"Inna Kouper","doi":"10.2218/ijdc.v17i1.835","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.835","url":null,"abstract":"This paper provides a systematic analysis of publications that discuss data curation in interdisciplinary and highly collaborative research (IHCR). Using content analysis methodology, it examined 159 publications and identified patterns in definitions of interdisciplinarity, projects’ participants and methodologies, and approaches to data curation. The findings suggest that data is a prominent component in interdisciplinarity. In addition to crossing disciplinary and other boundaries, IHCR is defined as curating and integrating heterogeneous data and creating new forms of knowledge from it. Using personal experiences and descriptive approaches, the publications discussed challenges that data curation in IHCR faces, including an increased overhead in coordination and management, lack of consistent metadata practices, and custom infrastructure that makes interoperability across projects, domains, and repositories difficult. The paper concludes with suggestions for future research.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135407868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article I describe the process and results of tracking a citation from a data repository through the article publication process and trying to add a citation event to one of our DOIs. I also discuss some other confusing aspects related to citation counts as indicated in various systems, including reference managers, the publisher’s perspective, aggregators, and DOI minters. I discovered numerous problems with citations. Addressing these problems is important as citations can be key to determining both the original use and reuse of a dataset, especially for repositories that do not track usage by requiring people to login or provide an email to download a dataset. The lack of transparency in some data citation systems and processes obscures how and where data is being used.
{"title":"If Data is Used in the Forest and No-one is Around to Hear it, Did it Happen? A Citation Count Investigation","authors":"S. Borda","doi":"10.2218/ijdc.v17i1.830","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.830","url":null,"abstract":"In this article I describe the process and results of tracking a citation from a data repository through the article publication process and trying to add a citation event to one of our DOIs. I also discuss some other confusing aspects related to citation counts as indicated in various systems, including reference managers, the publisher’s perspective, aggregators, and DOI minters. I discovered numerous problems with citations. Addressing these problems is important as citations can be key to determining both the original use and reuse of a dataset, especially for repositories that do not track usage by requiring people to login or provide an email to download a dataset. The lack of transparency in some data citation systems and processes obscures how and where data is being used. ","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46687987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reid I. Boehm, H. Calkins, Patricia B. Condon, J. Petters, Rachel Woodbrook
Federal funding agencies in the United States (U.S.) continue to work towards implementing their plans to increase public access to funded research and comply with the 2013 Office of Science and Technology memo Increasing Access to the Results of Federally Funded Scientific Research. In this article we report on an analysis of research data sharing policy documents from 17 U.S. federal funding agencies as of February 2021. Our analysis is guided by two questions: 1.) What do the findings suggest about the current state of and trends in U.S. federal funding agency data sharing requirements? 2.) In what ways are universities, institutions, associations, and researchers affected by and responding to these policies? Over the past five years, policy updates were common among these agencies and several themes have been thoroughly developed in that time; however, uncertainty remains around how funded researchers are expected to satisfy these policy requirements.
{"title":"Analysis of U.S. Federal Funding Agency Data Sharing Policies","authors":"Reid I. Boehm, H. Calkins, Patricia B. Condon, J. Petters, Rachel Woodbrook","doi":"10.2218/ijdc.v17i1.791","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.791","url":null,"abstract":"Federal funding agencies in the United States (U.S.) continue to work towards implementing their plans to increase public access to funded research and comply with the 2013 Office of Science and Technology memo Increasing Access to the Results of Federally Funded Scientific Research. In this article we report on an analysis of research data sharing policy documents from 17 U.S. federal funding agencies as of February 2021. Our analysis is guided by two questions: 1.) What do the findings suggest about the current state of and trends in U.S. federal funding agency data sharing requirements? 2.) In what ways are universities, institutions, associations, and researchers affected by and responding to these policies? Over the past five years, policy updates were common among these agencies and several themes have been thoroughly developed in that time; however, uncertainty remains around how funded researchers are expected to satisfy these policy requirements.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48880626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miranda Barnes, Ross Higman, G. Cole, Rupert Gatti, J. Fry
This brief report outlines some initial findings and challenges identified by the Community-Led Open Publication Infrastructures for Monographs (COPIM) project when looking to archive and preserve open access books produced by small, scholar-led presses. This paper is based on the research conducted by Work Package 7 in COPIM, which has a focus on the preservation and archiving of open access monographs in all their complexity, along with any accompanying materials.
{"title":"Long-Term Preservation and Reusability of Open Access Scholar-Led Press Monographs","authors":"Miranda Barnes, Ross Higman, G. Cole, Rupert Gatti, J. Fry","doi":"10.2218/ijdc.v17i1.826","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.826","url":null,"abstract":"This brief report outlines some initial findings and challenges identified by the Community-Led Open Publication Infrastructures for Monographs (COPIM) project when looking to archive and preserve open access books produced by small, scholar-led presses. This paper is based on the research conducted by Work Package 7 in COPIM, which has a focus on the preservation and archiving of open access monographs in all their complexity, along with any accompanying materials. ","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45224253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Research data are often released upon journal publication to enable result verification and reproducibility. For that reason, research dissemination infrastructures typically support diverse datasets coming from numerous disciplines, from tabular data and program code to audio-visual files. Metadata, or data about data, is critical to making research outputs adequately documented and FAIR. Aiming to contribute to the discussions on the development of metadata for research outputs, I conducted an exploratory analysis to determine how research datasets cluster based on what researchers organically deposit together. I use the content of over 40,000 datasets from the Harvard Dataverse research data repository as my sample for the cluster analysis. I find that the majority of the clusters are formed by single-type datasets, while in the rest of the sample, no meaningful clusters can be identified. For the result interpretation, I use the metadata standard employed by DataCite, a leading organization for documenting a scholarly record, and map existing resource types to my results. About 65% of the sample can be described with a single-type metadata (such as Dataset, Software orReport), while the rest would require aggregate metadata types. Though DataCite supports an aggregate type such as a Collection, I argue that a significant number of datasets, in particular those containing both data and code files (about 20% of the sample), would be more accurately described as a Replication resource metadata type. Such resource type would be particularly useful in facilitating research reproducibility.
{"title":"Cluster Analysis of Open Research Data: A Case for Replication Metadata","authors":"Ana Trisovic","doi":"10.2218/ijdc.v17i1.833","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.833","url":null,"abstract":"Research data are often released upon journal publication to enable result verification and reproducibility. For that reason, research dissemination infrastructures typically support diverse datasets coming from numerous disciplines, from tabular data and program code to audio-visual files. Metadata, or data about data, is critical to making research outputs adequately documented and FAIR. Aiming to contribute to the discussions on the development of metadata for research outputs, I conducted an exploratory analysis to determine how research datasets cluster based on what researchers organically deposit together. I use the content of over 40,000 datasets from the Harvard Dataverse research data repository as my sample for the cluster analysis. I find that the majority of the clusters are formed by single-type datasets, while in the rest of the sample, no meaningful clusters can be identified. For the result interpretation, I use the metadata standard employed by DataCite, a leading organization for documenting a scholarly record, and map existing resource types to my results. About 65% of the sample can be described with a single-type metadata (such as Dataset, Software orReport), while the rest would require aggregate metadata types. Though DataCite supports an aggregate type such as a Collection, I argue that a significant number of datasets, in particular those containing both data and code files (about 20% of the sample), would be more accurately described as a Replication resource metadata type. Such resource type would be particularly useful in facilitating research reproducibility.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135361093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Guirlet, Gaia Bongi, Elise Point, Grégoire Urvoy, René Schneider
As a contribution to the general effort in research to generalize and improve the practices of Open Research Data (ORD), we developed a model conceptualizing the degrees of maturity of a research community in terms of ORD. This model may be used to assess the ORD capacity or maturity level of a specific research community, to strengthen the use of standards with respect to ORD within this community, and to increase its ORD maturity level. We present the background and our motivations for developing such an instrument as well as the reasoning leading to its design. We present its elements in detail and discuss possible applications.
{"title":"Proposal for a Maturity Continuum Model for Open Research Data","authors":"M. Guirlet, Gaia Bongi, Elise Point, Grégoire Urvoy, René Schneider","doi":"10.2218/ijdc.v17i1.821","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.821","url":null,"abstract":"As a contribution to the general effort in research to generalize and improve the practices of Open Research Data (ORD), we developed a model conceptualizing the degrees of maturity of a research community in terms of ORD. This model may be used to assess the ORD capacity or maturity level of a specific research community, to strengthen the use of standards with respect to ORD within this community, and to increase its ORD maturity level. \u0000We present the background and our motivations for developing such an instrument as well as the reasoning leading to its design. We present its elements in detail and discuss possible applications. ","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43025731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper looks at the question of how and why to bring about greater reusability of Research Platforms (variously called Virtual Laboratories, Virtual Research Environments, or Science Gateways). It begins with some context for the Australian Research Data Commons, where the authors are based. It then examines the infrastructure concerns that are driving the need for platforms to be created and remain sustainable, and the connection from this to reusability. The paper then proceeds to discuss the ways in which FAIR is being extended to a range of research objects and infrastructure elements, before reviewing the work of the FAIR4VREs WG. The core of the paper is an examination, with examples or case studies, of four different paradigms for platform reusability: accessing, adopting, adapting, and abstracting. The paper concludes by examining actions undertaken by the ARDC to increase the likelihood of reusability.
{"title":"Putting the R into PlatfoRms","authors":"K. Levett, Jonathan Smillie, A. Treloar","doi":"10.2218/ijdc.v17i1.843","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.843","url":null,"abstract":"This paper looks at the question of how and why to bring about greater reusability of Research Platforms (variously called Virtual Laboratories, Virtual Research Environments, or Science Gateways). It begins with some context for the Australian Research Data Commons, where the authors are based. It then examines the infrastructure concerns that are driving the need for platforms to be created and remain sustainable, and the connection from this to reusability. The paper then proceeds to discuss the ways in which FAIR is being extended to a range of research objects and infrastructure elements, before reviewing the work of the FAIR4VREs WG. The core of the paper is an examination, with examples or case studies, of four different paradigms for platform reusability: accessing, adopting, adapting, and abstracting. The paper concludes by examining actions undertaken by the ARDC to increase the likelihood of reusability.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45855757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As an experiment, the Research Data Journal for the Humanities and Social Sciences (RDJ) has temporarily extended the usual format of the online journal with so-called ‘showcases’, separate web pages containing a quick introduction to a dataset, embedded multimedia, interactive components, and facilities to directly preview and explore the dataset described. The aim was to create a coherent hyper document with content communicated via different media (multimodality) and provide space for new forms of scientific publication such as executable papers (e.g. Jupyter notebooks). This paper discusses the objectives, technical implementations, and the need for innovation in data publishing considering the advanced possibilities of today's digital modes of communication. The data showcases experiment proved to be a useful starting point for an exploration of related developments within and outside the humanities and social sciences. It turns out that small-scale experiments are relatively easy to perform thanks to the easy availability of digital technology. However, real innovation in publishing affects organization and infrastructure and requires the joint effort of publishers, editors, data repositories, and authors. It implies a thorough update of the concept of publication and adaptation of the production process. This paper also pays attention to these obstacles to taking new paths.
{"title":"Data Showcases: the Data Journal in a Multimodal World","authors":"L. Breure, P. Doorn, H. Voorbij","doi":"10.2218/ijdc.v17i1.789","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.789","url":null,"abstract":" \u0000 As an experiment, the Research Data Journal for the Humanities and Social Sciences (RDJ) has temporarily extended the usual format of the online journal with so-called ‘showcases’, separate web pages containing a quick introduction to a dataset, embedded multimedia, interactive components, and facilities to directly preview and explore the dataset described. The aim was to create a coherent hyper document with content communicated via different media (multimodality) and provide space for new forms of scientific publication such as executable papers (e.g. Jupyter notebooks). This paper discusses the objectives, technical implementations, and the need for innovation in data publishing considering the advanced possibilities of today's digital modes of communication. The data showcases experiment proved to be a useful starting point for an exploration of related developments within and outside the humanities and social sciences. It turns out that small-scale experiments are relatively easy to perform thanks to the easy availability of digital technology. However, real innovation in publishing affects organization and infrastructure and requires the joint effort of publishers, editors, data repositories, and authors. It implies a thorough update of the concept of publication and adaptation of the production process. This paper also pays attention to these obstacles to taking new paths.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42515062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Big social research repurposes existing data from online sources such as social media, blogs, or online forums, with a goal of advancing knowledge of human behavior and social phenomena. Big social research also presents an array of challenges that can prevent data sharing and reuse. This brief report presents an overview of a larger study that aims to understand the data curation implications of big social research to support use and reuse of big social data. The study, which is based in the United States, identifies six key issues relating to big social research and big social data curation through a review of the literature. It then further investigates perceptions and practices relating to these six key issues through semi-structured interviews with big social researchers and data curators. This report concludes with implications for data curation practice: metadata and documentation, connecting with researchers throughout the research process, data repository services, and advocating for community standards. Supporting responsible practices for using big social data can help scale up social science research, thus enhancing our understanding of human behavior and social phenomena.
{"title":"Data Curation Strategies to Support Responsible Big Social Research and Big Social Data Reuse","authors":"Sara Mannheimer","doi":"10.2218/ijdc.v17i1.823","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.823","url":null,"abstract":"Big social research repurposes existing data from online sources such as social media, blogs, or online forums, with a goal of advancing knowledge of human behavior and social phenomena. Big social research also presents an array of challenges that can prevent data sharing and reuse. \u0000This brief report presents an overview of a larger study that aims to understand the data curation implications of big social research to support use and reuse of big social data. The study, which is based in the United States, identifies six key issues relating to big social research and big social data curation through a review of the literature. It then further investigates perceptions and practices relating to these six key issues through semi-structured interviews with big social researchers and data curators. \u0000This report concludes with implications for data curation practice: metadata and documentation, connecting with researchers throughout the research process, data repository services, and advocating for community standards. Supporting responsible practices for using big social data can help scale up social science research, thus enhancing our understanding of human behavior and social phenomena.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45352614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents original research about the behaviours, histories, demographics, and motivations of scholars who code, specifically how they interact with version control systems locally and on the Web. By understanding patrons through multiple lenses – daily productivity habits, motivations, and scholarly needs – librarians and archivists can tailor services for software management, curation, and long-term reuse, raising the possibility for long-term reproducibility of a multitude of scholarship.
{"title":"Who Writes Scholarly Code?","authors":"Sarah Nguyễn, Vicky Rampin","doi":"10.2218/ijdc.v17i1.839","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.839","url":null,"abstract":"This paper presents original research about the behaviours, histories, demographics, and motivations of scholars who code, specifically how they interact with version control systems locally and on the Web. By understanding patrons through multiple lenses – daily productivity habits, motivations, and scholarly needs – librarians and archivists can tailor services for software management, curation, and long-term reuse, raising the possibility for long-term reproducibility of a multitude of scholarship.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46909924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}