International journal of digital curation最新文献

Reproducible and Attributable Materials Science Curation Practices: A Case Study 可复制和可归属的材料科学保存实践：案例研究

International journal of digital curation

Pub Date : 2024-07-28 DOI: 10.2218/ijdc.v18i1.940

Ye Li, Sara Wilson, Micah Altman

While small labs produce much of the fundamental experimental research in Material Science and Engineering (MSE), little is known about their data management and sharing practices and the extent to which they promote trust in, and transparency of, the published research. In this research, we conduct a case study of a leading MSE research lab to characterize the limits of current data management and sharing practices concerning reproducibility and attribution. We systematically reconstruct the workflows, underpinning four research projects by combining interviews, document review, and digital forensics. We then apply information graph analysis and computer-assisted retrospective auditing to identify where critical research information is unavailable or at risk. We find that while data management and sharing practices in this leading lab protect against computer and disk failure, they are insufficient to ensure reproducibility or correct attribution of work — especially when a group member withdraws before project completion. We conclude with recommendations for adjustments to MSE data management and sharing practices to promote trustworthiness and transparency by adding lightweight automated file-level auditing and automated data transfer processes.

虽然材料科学与工程（MSE）领域的许多基础实验研究都是由小型实验室完成的，但人们对这些实验室的数据管理和共享实践以及它们在多大程度上提高了已发表研究成果的可信度和透明度却知之甚少。在本研究中，我们对一个领先的 MSE 研究实验室进行了案例研究，以了解当前数据管理和共享实践在可复制性和归属方面的局限性。我们结合访谈、文件审查和数字取证，系统地重建了支撑四个研究项目的工作流程。然后，我们应用信息图分析和计算机辅助回顾性审计来确定哪些关键研究信息不可用或存在风险。我们发现，虽然这个领先实验室的数据管理和共享实践可以防止计算机和磁盘故障，但不足以确保工作的可复制性或正确归属，尤其是当小组成员在项目完成前退出时。最后，我们建议对 MSE 数据管理和共享实践进行调整，通过增加轻量级自动文件级审计和自动数据传输流程来提高可信度和透明度。

{"title":"Reproducible and Attributable Materials Science Curation Practices: A Case Study","authors":"Ye Li, Sara Wilson, Micah Altman","doi":"10.2218/ijdc.v18i1.940","DOIUrl":"https://doi.org/10.2218/ijdc.v18i1.940","url":null,"abstract":"While small labs produce much of the fundamental experimental research in Material Science and Engineering (MSE), little is known about their data management and sharing practices and the extent to which they promote trust in, and transparency of, the published research. \u0000In this research, we conduct a case study of a leading MSE research lab to characterize the limits of current data management and sharing practices concerning reproducibility and attribution. We systematically reconstruct the workflows, underpinning four research projects by combining interviews, document review, and digital forensics. We then apply information graph analysis and computer-assisted retrospective auditing to identify where critical research information is unavailable or at risk. \u0000We find that while data management and sharing practices in this leading lab protect against computer and disk failure, they are insufficient to ensure reproducibility or correct attribution of work — especially when a group member withdraws before project completion. \u0000We conclude with recommendations for adjustments to MSE data management and sharing practices to promote trustworthiness and transparency by adding lightweight automated file-level auditing and automated data transfer processes.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":"11 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141796440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Trusted Research Environments: Analysis of Characteristics and Data Availability 可信的研究环境：特征和数据可用性分析

International journal of digital curation

Pub Date : 2024-07-22 DOI: 10.2218/ijdc.v18i1.939

Martin Weise, Andreas Rauber

Trusted Research Environments (TREs) enable the analysis of sensitive data under strict security assertions that protect the data with technical, organizational, and legal measures from (accidentally) being leaked outside the facility. While many TREs exist in Europe, little information is available publicly on the architecture and descriptions of their building blocks and their slight technical variations. To highlight on these problems, an overview of the existing, publicly described TREs and a bibliography linking to the system description are provided. Their technical characteristics, especially in commonalities and variations, are analysed, and insight is provided into their data type characteristics and availability. The literature study shows that 47 TREs worldwide provide access to sensitive data, of which two-thirds provide data predominantly via secure remote access. Statistical offices (SOs) make the majority of sensitive data records included in this study available.

可信研究环境（TREs）能够在严格的安全保障下分析敏感数据，通过技术、组织和法律措施保护数据不（意外）泄露到设施之外。虽然欧洲有许多 TRE，但关于其架构、构件描述及其细微技术变化的公开信息却很少。为了强调这些问题，本文概述了现有的、公开描述的 TRE，并提供了链接到系统描述的参考书目。分析了它们的技术特点，特别是共性和差异，并深入介绍了它们的数据类型特点和可用性。文献研究表明，全球有 47 个技术资源中心提供敏感数据访问，其中三分之二主要通过安全的远程访问提供数据。统计局（SO）提供了本研究中的大部分敏感数据记录。

引用次数: 0

Preserving Secondary Knowledge 保存第二知识

International journal of digital curation

Pub Date : 2024-07-08 DOI: 10.2218/ijdc.v18i1.930

Klaus Rechert, Rafael Gieschke

Emulation and migration are still our main tools for digital curation and preservation practice. Both strategies have been discussed extensively and have been demonstrated to be effective and applicable in various scenarios. Discussions have primarily centered on technical feasibility, workflow integration, and usability. However, there remains one important aspect when discussing these two techniques: managing and preserving operational knowledge. Both approaches require specialized knowledge but especially emulation requires future users to also have a great variety of knowledge about past software and computer systems for successful operation. We investigate how this knowledge can be stored and utilized, and to what extent it can be rendered machine-actionable, using modern large language models. We demonstrate a proof-of-concept implementation that operates an emulated software environment through natural language.

仿真和迁移仍然是我们进行数字资料整理和保存实践的主要工具。这两种策略已被广泛讨论，并被证明在各种情况下都是有效和适用的。讨论主要集中在技术可行性、工作流程整合和可用性方面。然而，在讨论这两种技术时，还有一个重要方面：管理和保存业务知识。这两种方法都需要专业知识，尤其是仿真技术要求未来的用户也要掌握大量有关过去软件和计算机系统的知识，以便成功操作。我们研究了如何利用现代大型语言模型来存储和利用这些知识，以及在多大程度上可以将这些知识转化为机器可操作的知识。我们展示了通过自然语言操作模拟软件环境的概念验证实施方案。

引用次数: 1

Factors Influencing Perceptions of Trust in Data Infrastructures 影响对数据基础设施信任感的因素

International journal of digital curation

Pub Date : 2024-05-13 DOI: 10.2218/ijdc.v18i1.921

Katharina Flicker, Andreas Rauber, Bettina Kern, Fajar J. Ekaputra

Trust is an essential pre-condition for the acceptance of digital infrastructures and services. Transparency has been identified as one mechanism for increasing trustworthiness. Yet, it is difficult to assess to which extent and how exactly different aspects of transparency contribute to trust, or potentially impede it in cases of overwhelming complexity of the information provided. To address these issues, we performed two initial studies to help determining the factors that influence or have impact on trust, focusing on transparency across a range of elements associated with data, data infrastructures and virtual research environments. On one hand, we performed a survey among IT experts in the field of data science focusing on quality aspects in the context of re-using and sharing open source software, assessing issues such as the need for documentation, test cases, and accountability. On the other hand, we complemented this with a set of semi-structured interviews with senior researchers to address specific issues of the degree of transparency achievable with different approaches. They include, for example, the amount of transparency we can achieve with approaches from explainable AI, or the usefulness and limitations of data provenance in determining the suitability of data for reuse and others. Specifically, we consider mechanisms on three levels, i.e. technical, process-oriented as well as social mechanisms. Starting from attributes of trust in the “analogue world”, we aim to understand which of these can be applied in the digital world, how they differ, and what additional mechanisms need to be established, in order to support trust in complex socio-technological processes and their emergent results when the traditional approaches cannot be applied anymore.

信任是接受数字基础设施和服务的基本先决条件。透明度被认为是提高信任度的一种机制。然而，很难评估透明度的不同方面在多大程度上以及如何确切地促进信任，或者在所提供的信息过于复杂的情况下可能会阻碍信任。为了解决这些问题，我们进行了两项初步研究，以帮助确定影响信任或对信任有影响的因素，重点是与数据、数据基础设施和虚拟研究环境相关的一系列要素的透明度。一方面，我们对数据科学领域的 IT 专家进行了一项调查，重点关注重复使用和共享开放源代码软件的质量问题，评估了对文档、测试用例和问责制的需求等问题。另一方面，我们还对资深研究人员进行了一系列半结构化访谈，以解决不同方法可实现的透明程度等具体问题。例如，我们可以利用可解释人工智能的方法实现多少透明度，或者数据出处在确定数据是否适合重用等方面的作用和局限性。具体来说，我们考虑了三个层面的机制，即技术机制、流程导向机制和社会机制。从 "模拟世界 "中的信任属性出发，我们旨在了解其中哪些可以应用于数字世界，它们有何不同，以及需要建立哪些额外的机制，以便在传统方法无法继续应用的情况下，支持对复杂的社会技术过程及其新兴结果的信任。

{"title":"Factors Influencing Perceptions of Trust in Data Infrastructures","authors":"Katharina Flicker, Andreas Rauber, Bettina Kern, Fajar J. Ekaputra","doi":"10.2218/ijdc.v18i1.921","DOIUrl":"https://doi.org/10.2218/ijdc.v18i1.921","url":null,"abstract":"\u0000Trust is an essential pre-condition for the acceptance of digital infrastructures and services. Transparency has been identified as one mechanism for increasing trustworthiness. Yet, it is difficult to assess to which extent and how exactly different aspects of transparency contribute to trust, or potentially impede it in cases of overwhelming complexity of the information provided. To address these issues, we performed two initial studies to help determining the factors that influence or have impact on trust, focusing on transparency across a range of elements associated with data, data infrastructures and virtual research environments. On one hand, we performed a survey among IT experts in the field of data science focusing on quality aspects in the context of re-using and sharing open source software, assessing issues such as the need for documentation, test cases, and accountability. On the other hand, we complemented this with a set of semi-structured interviews with senior researchers to address specific issues of the degree of transparency achievable with different approaches. They include, for example, the amount of transparency we can achieve with approaches from explainable AI, or the usefulness and limitations of data provenance in determining the suitability of data for reuse and others. Specifically, we consider mechanisms on three levels, i.e. technical, process-oriented as well as social mechanisms. Starting from attributes of trust in the “analogue world”, we aim to understand which of these can be applied in the digital world, how they differ, and what additional mechanisms need to be established, in order to support trust in complex socio-technological processes and their emergent results when the traditional approaches cannot be applied anymore.\u0000","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":"68 9","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140983494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing Quality Variations in Early Career Researchers’ Data Management Plans 评估早期职业研究人员数据管理计划的质量差异

International journal of digital curation

Pub Date : 2024-04-14 DOI: 10.2218/ijdc.v18i1.873

Jukka Rantasaari

This paper aims to better understand early career researchers’ (ECRs’) research data management (RDM) competencies by assessing the contents and quality of data management plans (DMPs) developed during a multi-stakeholder RDM course. We also aim to identify differences between DMPs in relation to several background variables (e.g., discipline, course track). The Basics of Research Data Management (BRDM) course has been held in two multi-faculty, research-intensive universities in Finland since 2020. In this study, 223 ECRs’ DMPs created in the BRDM of 2020 - 2022 were assessed, using the recommendations and criteria of the Finnish DMP Evaluation Guide + General Finnish DMP Guidance (FDEG). The median quality of DMPs appeared to be satisfactory. The differences in rating according to FDEG’s three-point performance criteria were statistically insignificant between DMPs developed in separate years, course tracks or disciplines. However, using content analysis, differences were found between disciplines or course tracks regarding DMP’s key characteristics such as sharing, storing, and preserving data. DMPs that contained a data table (DtDMPs) also differed highly significantly from prose DMPs. DtDMPs better acknowledged the data handling needs of different data types and improved the overall quality of a DMP. The results illustrated that the ECRs had learned the basic RDM competencies and grasped their significance to the integrity, reliability, and reusability of data. However, more focused, further training to reach the advanced competency is needed, especially in areas of handling and sharing personal data, legal issues, long-term preserving, and funders’ data policies. Equally important to the cultural change when RDM is an organic part of the research practices is to merge research support services, processes, and infrastructure into the research projects’ processes. Additionally, incentives are needed for sharing and reusing data.

本文旨在通过评估在多方参与的 RDM 课程中制定的数据管理计划 (DMP) 的内容和质量，更好地了解早期职业研究人员 (ECR) 的研究数据管理 (RDM) 能力。我们还旨在找出与几个背景变量（如学科、课程方向）相关的 DMP 之间的差异。研究数据管理基础（BRDM）课程自 2020 年起在芬兰两所多学院研究密集型大学开设。在这项研究中，我们采用《芬兰 DMP 评估指南》+《芬兰 DMP 指导通则》（FDEG）的建议和标准，对在 2020 - 2022 年的 BRDM 课程中创建的 223 个 ECR 的 DMP 进行了评估。DMP质量的中位数似乎是令人满意的。根据 FDEG 的三点绩效标准，在不同年份、课程轨道或学科中制定的 DMP 之间的评分差异在统计上并不显著。然而，通过内容分析，我们发现不同学科或课程之间在 DMP 的主要特征（如共享、存储和保存数据）方面存在差异。包含数据表的 DMP（DtDMP）与散文式 DMP 也有很大不同。DtDMP 更好地满足了不同数据类型的数据处理需求，提高了 DMP 的整体质量。结果表明，ECR 已经学会了 RDM 的基本能力，并掌握了它们对数据完整性、可靠性和可重用性的重要意义。然而，要达到高级能力，还需要更有针对性的进一步培训，特别是在处理和共享个人数据、法律问题、长期保存和资助者数据政策等方面。当 RDM 成为研究实践的有机组成部分时，同样重要的文化变革是将研究支持服务、流程和基础设施纳入研究项目流程。此外，还需要为数据共享和再利用提供激励措施。

{"title":"Assessing Quality Variations in Early Career Researchers’ Data Management Plans","authors":"Jukka Rantasaari","doi":"10.2218/ijdc.v18i1.873","DOIUrl":"https://doi.org/10.2218/ijdc.v18i1.873","url":null,"abstract":"This paper aims to better understand early career researchers’ (ECRs’) research data management (RDM) competencies by assessing the contents and quality of data management plans (DMPs) developed during a multi-stakeholder RDM course. We also aim to identify differences between DMPs in relation to several background variables (e.g., discipline, course track). The Basics of Research Data Management (BRDM) course has been held in two multi-faculty, research-intensive universities in Finland since 2020. In this study, 223 ECRs’ DMPs created in the BRDM of 2020 - 2022 were assessed, using the recommendations and criteria of the Finnish DMP Evaluation Guide + General Finnish DMP Guidance (FDEG). The median quality of DMPs appeared to be satisfactory. The differences in rating according to FDEG’s three-point performance criteria were statistically insignificant between DMPs developed in separate years, course tracks or disciplines. However, using content analysis, differences were found between disciplines or course tracks regarding DMP’s key characteristics such as sharing, storing, and preserving data. DMPs that contained a data table (DtDMPs) also differed highly significantly from prose DMPs. DtDMPs better acknowledged the data handling needs of different data types and improved the overall quality of a DMP. The results illustrated that the ECRs had learned the basic RDM competencies and grasped their significance to the integrity, reliability, and reusability of data. However, more focused, further training to reach the advanced competency is needed, especially in areas of handling and sharing personal data, legal issues, long-term preserving, and funders’ data policies. Equally important to the cultural change when RDM is an organic part of the research practices is to merge research support services, processes, and infrastructure into the research projects’ processes. Additionally, incentives are needed for sharing and reusing data.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":"23 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140705655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Community-based Curate-a-Thons to Enhance Preservation of Global Genetic Biodiversity Data 以社区为基础的 "策划活动"，加强对全球遗传生物多样性数据的保护

International journal of digital curation

Pub Date : 2024-02-11 DOI: 10.2218/ijdc.v18i1.891

Andrea L Pritt, Briana E. Wham, Rachel H. Toczydlowski, Eric D Crandall

Science, Technology, Engineering, and Mathematics (STEM) and Research Data Librarians collaborated with an international research team of conservation geneticists to create an instructional and practical guide combining genetic biodiversity initiatives and data curation. Over the course of two months, the academic librarians held multiple community-based Curate-A-Thons where an international group of students, researchers, librarians, and faculty researchers participated in tracking down publications and metadata for genomic sequence data, thus crowd-sourcing this effort of metadata enhancement. This article details the successful Curate-a-Thon design and implementation process; the openly available instructional materials created and used to host the Curate-a-Thons; and the challenges and successes of these community-based events.

科学、技术、工程与数学（STEM）和研究数据图书馆员与一个由保护遗传学家组成的国际研究团队合作，创建了一个结合遗传生物多样性计划和数据整理的指导性实用指南。在两个月的时间里，学术图书馆员举办了多次社区性的 "Curate-A-Thons "活动，国际学生、研究人员、图书馆员和教研人员参与其中，追踪基因组序列数据的出版物和元数据，从而以众包的方式加强元数据。本文详细介绍了成功的 "Curate-a-Thon "设计和实施过程、为举办 "Curate-a-Thon "而创建和使用的公开可用教学材料，以及这些基于社区的活动所面临的挑战和取得的成功。

引用次数: 0

Community-based Curate-a-Thons to Enhance Preservation of Global Genetic Biodiversity Data 以社区为基础的 "策划活动"，加强对全球遗传生物多样性数据的保护

International journal of digital curation

Pub Date : 2024-02-11 DOI: 10.2218/ijdc.v18i1.891

Andrea L Pritt, Briana E. Wham, Rachel H. Toczydlowski, Eric D Crandall

Science, Technology, Engineering, and Mathematics (STEM) and Research Data Librarians collaborated with an international research team of conservation geneticists to create an instructional and practical guide combining genetic biodiversity initiatives and data curation. Over the course of two months, the academic librarians held multiple community-based Curate-A-Thons where an international group of students, researchers, librarians, and faculty researchers participated in tracking down publications and metadata for genomic sequence data, thus crowd-sourcing this effort of metadata enhancement. This article details the successful Curate-a-Thon design and implementation process; the openly available instructional materials created and used to host the Curate-a-Thons; and the challenges and successes of these community-based events.

科学、技术、工程与数学（STEM）和研究数据图书馆员与一个由保护遗传学家组成的国际研究团队合作，创建了一个结合遗传生物多样性计划和数据整理的指导性实用指南。在两个月的时间里，学术图书馆员举办了多次社区性的 "Curate-A-Thons "活动，国际学生、研究人员、图书馆员和教研人员参与其中，追踪基因组序列数据的出版物和元数据，从而以众包的方式加强元数据。本文详细介绍了成功的 "Curate-a-Thon "设计和实施过程、为举办 "Curate-a-Thon "而创建和使用的公开可用教学材料，以及这些基于社区的活动所面临的挑战和取得的成功。

引用次数: 0

Generation of Revision Identifier (rsid) Numbers in MS Word 在 MS Word 中生成修订版标识符 (rsid) 编号

International journal of digital curation

Pub Date : 2024-02-11 DOI: 10.2218/ijdc.v18i1.870

D. Spennemann, Clare L. Singh

The 2007 implementation of the Office Open XML standard for Microsoft Word introduced the assignation of individual revision save identifiers (Rsid) to document editing sessions that end in a save action. The relevant standards ECMA (2016) and ISO/ IEC 29500-1:2016 (2016) stipulate that these Rsid should be allocated randomised but with increasing numerical value, thereby documenting the progress of the editing. As MS Word is the most ubiquitous word processing software, Rsid appear to be a useful tool to examine and provide evidence for a wide range of common document generation editing and modification processes and file management operations, with implications for document analysis including, but not limited to academic integrity issues in student assignment submissions (e.g. contract cheating). This paper presents the results of a series of experiments conducted to assess whether and how well MS Word implements the ECMA and ISO/ IEC standards. The results show that the number of allocated Rsid indeed increases with each edit and save action, with the previous Rsids carried over and retained. The newly allocated Rsid, however, do not conform to the standard as the numerical value of a Rsid associated with a save action may be larger or smaller than any or all of those allocated during that of the previous save actions. The allocation of a new Rsid is not necessarily caused by an edit event but that a new Rsid can also be generated if a file is saved as rtf or if it is sent as an e-mail from within MS Word, although the file was not edited in any way. Rsid numbers are not generated if a person opens a MS Word document, reads it and closes the file without saving, making this action impossible to detect.MS Word template files on a given machine contain document (root) Rsid numbers that are generated when a newly installed application is launched for the first time. As these will be embedded as legacy Rsid into every new file generated from that template file, they act as signatures for all MS Word documents that are created.The experiments have shown that user behaviour has a direct influence on the number of Rsid represented in a given file. Although the implementation of Office Open XML chosen by Microsoft is not compliant with the relevant standards, and thus Rsid cannot be used determine the exact chronological order of all editing sequences within a given document, the Rsid retain their value for document forensics as they are associated with specific edit events, and illuminate the document writing and editing process.

2007 年实施的 Microsoft Word Office Open XML 标准为以保存操作结束的文档编辑会话分配了单个修订保存标识符（Rsid）。相关标准 ECMA (2016) 和 ISO/ IEC 29500-1:2016 (2016) 规定，这些 Rsid 应随机分配，但数值应不断增加，从而记录编辑进度。由于 MS Word 是最普遍的文字处理软件，Rsid 似乎是一种有用的工具，可用于检查各种常见文档的生成、编辑和修改过程以及文件管理操作，并为其提供证据，其对文档分析的影响包括但不限于学生作业提交中的学术诚信问题（如合同作弊）。本文介绍了一系列实验的结果，这些实验旨在评估 MS Word 是否以及如何很好地执行 ECMA 和 ISO/ IEC 标准。结果表明，分配的 Rsid 数量确实随着每次编辑和保存操作的进行而增加，之前的 Rsid 会被继承和保留。不过，新分配的 Rsid 并不符合标准，因为与保存操作相关的 Rsid 数值可能大于或小于之前保存操作中分配的任何或所有 Rsid。新 Rsid 的分配不一定是由编辑事件引起的，但如果文件被保存为 rtf 格式，或者从 MS Word 中以电子邮件的形式发送，也会产生新的 Rsid，尽管文件没有经过任何编辑。如果一个人打开一个 MS Word 文档，读完后没有保存就关闭了文件，则不会生成 Rsid 号码，因此无法检测到这一操作。给定机器上的 MS Word 模板文件包含文档（根）Rsid 号码，这些号码在首次启动新安装的应用程序时生成。实验表明，用户行为对特定文件中的 Rsid 数量有直接影响。尽管微软选择的 Office Open XML 实现不符合相关标准，因此 Rsid 无法用于确定给定文档中所有编辑序列的确切时间顺序，但 Rsid 仍具有文档取证价值，因为它们与特定的编辑事件相关联，并能阐明文档的编写和编辑过程。

{"title":"Generation of Revision Identifier (rsid) Numbers in MS Word","authors":"D. Spennemann, Clare L. Singh","doi":"10.2218/ijdc.v18i1.870","DOIUrl":"https://doi.org/10.2218/ijdc.v18i1.870","url":null,"abstract":"The 2007 implementation of the Office Open XML standard for Microsoft Word introduced the assignation of individual revision save identifiers (Rsid) to document editing sessions that end in a save action. The relevant standards ECMA (2016) and ISO/ IEC 29500-1:2016 (2016) stipulate that these Rsid should be allocated randomised but with increasing numerical value, thereby documenting the progress of the editing. As MS Word is the most ubiquitous word processing software, Rsid appear to be a useful tool to examine and provide evidence for a wide range of common document generation editing and modification processes and file management operations, with implications for document analysis including, but not limited to academic integrity issues in student assignment submissions (e.g. contract cheating). \u0000This paper presents the results of a series of experiments conducted to assess whether and how well MS Word implements the ECMA and ISO/ IEC standards. The results show that the number of allocated Rsid indeed increases with each edit and save action, with the previous Rsids carried over and retained. The newly allocated Rsid, however, do not conform to the standard as the numerical value of a Rsid associated with a save action may be larger or smaller than any or all of those allocated during that of the previous save actions. The allocation of a new Rsid is not necessarily caused by an edit event but that a new Rsid can also be generated if a file is saved as rtf or if it is sent as an e-mail from within MS Word, although the file was not edited in any way. Rsid numbers are not generated if a person opens a MS Word document, reads it and closes the file without saving, making this action impossible to detect.\u0000MS Word template files on a given machine contain document (root) Rsid numbers that are generated when a newly installed application is launched for the first time. As these will be embedded as legacy Rsid into every new file generated from that template file, they act as signatures for all MS Word documents that are created.\u0000The experiments have shown that user behaviour has a direct influence on the number of Rsid represented in a given file. Although the implementation of Office Open XML chosen by Microsoft is not compliant with the relevant standards, and thus Rsid cannot be used determine the exact chronological order of all editing sequences within a given document, the Rsid retain their value for document forensics as they are associated with specific edit events, and illuminate the document writing and editing process.\u0000 ","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":"57 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139845481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generation of Revision Identifier (rsid) Numbers in MS Word 在 MS Word 中生成修订版标识符 (rsid) 编号

International journal of digital curation

Pub Date : 2024-02-11 DOI: 10.2218/ijdc.v18i1.870

D. Spennemann, Clare L. Singh

The 2007 implementation of the Office Open XML standard for Microsoft Word introduced the assignation of individual revision save identifiers (Rsid) to document editing sessions that end in a save action. The relevant standards ECMA (2016) and ISO/ IEC 29500-1:2016 (2016) stipulate that these Rsid should be allocated randomised but with increasing numerical value, thereby documenting the progress of the editing. As MS Word is the most ubiquitous word processing software, Rsid appear to be a useful tool to examine and provide evidence for a wide range of common document generation editing and modification processes and file management operations, with implications for document analysis including, but not limited to academic integrity issues in student assignment submissions (e.g. contract cheating). This paper presents the results of a series of experiments conducted to assess whether and how well MS Word implements the ECMA and ISO/ IEC standards. The results show that the number of allocated Rsid indeed increases with each edit and save action, with the previous Rsids carried over and retained. The newly allocated Rsid, however, do not conform to the standard as the numerical value of a Rsid associated with a save action may be larger or smaller than any or all of those allocated during that of the previous save actions. The allocation of a new Rsid is not necessarily caused by an edit event but that a new Rsid can also be generated if a file is saved as rtf or if it is sent as an e-mail from within MS Word, although the file was not edited in any way. Rsid numbers are not generated if a person opens a MS Word document, reads it and closes the file without saving, making this action impossible to detect.MS Word template files on a given machine contain document (root) Rsid numbers that are generated when a newly installed application is launched for the first time. As these will be embedded as legacy Rsid into every new file generated from that template file, they act as signatures for all MS Word documents that are created.The experiments have shown that user behaviour has a direct influence on the number of Rsid represented in a given file. Although the implementation of Office Open XML chosen by Microsoft is not compliant with the relevant standards, and thus Rsid cannot be used determine the exact chronological order of all editing sequences within a given document, the Rsid retain their value for document forensics as they are associated with specific edit events, and illuminate the document writing and editing process.

2007 年实施的 Microsoft Word Office Open XML 标准为以保存操作结束的文档编辑会话分配了单个修订保存标识符（Rsid）。相关标准 ECMA (2016) 和 ISO/ IEC 29500-1:2016 (2016) 规定，这些 Rsid 应随机分配，但数值应不断增加，从而记录编辑进度。由于 MS Word 是最普遍的文字处理软件，Rsid 似乎是一种有用的工具，可用于检查各种常见文档的生成、编辑和修改过程以及文件管理操作，并为其提供证据，其对文档分析的影响包括但不限于学生作业提交中的学术诚信问题（如合同作弊）。本文介绍了一系列实验的结果，这些实验旨在评估 MS Word 是否以及如何很好地执行 ECMA 和 ISO/ IEC 标准。结果表明，分配的 Rsid 数量确实随着每次编辑和保存操作的进行而增加，之前的 Rsid 会被继承和保留。不过，新分配的 Rsid 并不符合标准，因为与保存操作相关的 Rsid 数值可能大于或小于之前保存操作中分配的任何或所有 Rsid。新 Rsid 的分配不一定是由编辑事件引起的，但如果文件被保存为 rtf 格式，或者从 MS Word 中以电子邮件的形式发送，也会产生新的 Rsid，尽管文件没有经过任何编辑。如果一个人打开一个 MS Word 文档，读完后没有保存就关闭了文件，则不会生成 Rsid 号码，因此无法检测到这一操作。给定机器上的 MS Word 模板文件包含文档（根）Rsid 号码，这些号码在首次启动新安装的应用程序时生成。实验表明，用户行为对特定文件中的 Rsid 数量有直接影响。尽管微软选择的 Office Open XML 实现不符合相关标准，因此 Rsid 无法用于确定给定文档中所有编辑序列的确切时间顺序，但 Rsid 仍具有文档取证价值，因为它们与特定的编辑事件相关联，并能阐明文档的编写和编辑过程。

{"title":"Generation of Revision Identifier (rsid) Numbers in MS Word","authors":"D. Spennemann, Clare L. Singh","doi":"10.2218/ijdc.v18i1.870","DOIUrl":"https://doi.org/10.2218/ijdc.v18i1.870","url":null,"abstract":"The 2007 implementation of the Office Open XML standard for Microsoft Word introduced the assignation of individual revision save identifiers (Rsid) to document editing sessions that end in a save action. The relevant standards ECMA (2016) and ISO/ IEC 29500-1:2016 (2016) stipulate that these Rsid should be allocated randomised but with increasing numerical value, thereby documenting the progress of the editing. As MS Word is the most ubiquitous word processing software, Rsid appear to be a useful tool to examine and provide evidence for a wide range of common document generation editing and modification processes and file management operations, with implications for document analysis including, but not limited to academic integrity issues in student assignment submissions (e.g. contract cheating). \u0000This paper presents the results of a series of experiments conducted to assess whether and how well MS Word implements the ECMA and ISO/ IEC standards. The results show that the number of allocated Rsid indeed increases with each edit and save action, with the previous Rsids carried over and retained. The newly allocated Rsid, however, do not conform to the standard as the numerical value of a Rsid associated with a save action may be larger or smaller than any or all of those allocated during that of the previous save actions. The allocation of a new Rsid is not necessarily caused by an edit event but that a new Rsid can also be generated if a file is saved as rtf or if it is sent as an e-mail from within MS Word, although the file was not edited in any way. Rsid numbers are not generated if a person opens a MS Word document, reads it and closes the file without saving, making this action impossible to detect.\u0000MS Word template files on a given machine contain document (root) Rsid numbers that are generated when a newly installed application is launched for the first time. As these will be embedded as legacy Rsid into every new file generated from that template file, they act as signatures for all MS Word documents that are created.\u0000The experiments have shown that user behaviour has a direct influence on the number of Rsid represented in a given file. Although the implementation of Office Open XML chosen by Microsoft is not compliant with the relevant standards, and thus Rsid cannot be used determine the exact chronological order of all editing sequences within a given document, the Rsid retain their value for document forensics as they are associated with specific edit events, and illuminate the document writing and editing process.\u0000 ","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":"120 36","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139785561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

E-Preservation of Old and Rare Books: A Structured Approach for Creating a Digital Collection 古书和珍本的电子保存:创建数字馆藏的结构化方法

International journal of digital curation

Pub Date : 2023-11-14 DOI: 10.2218/ijdc.v17i1.855

Sangeeta Chakravarty

Antique books, old and rare documents are fragile and vulnerable to different hazards. Preserving them for an extended period is a real challenge. From ancient times people started expressing their knowledge by writing and keeping records and subsequently started collecting and storing these at later ages as antique materials. These can be seen in different museums, libraries, archives, individual households, and other places all over the world. Preserving and conserving these antique, old and rare books, documents etc. in good condition is a challenge for librarians, conservators, preservation administrators or persons associated with storing these. In this paper, details of the digital preservation of such a collection available in the Directorate of Historical and Antiquarian Studies (DHAS), Guwahati, Assam, India, are discussed. DHAS is a Government of Assam wing and is mainly mandated to collect, preserve and research historical and antiquarian resources. The collection of DHAS is one of the oldest collections and has been serving as a study and research centre in Assam since 1928. A special drive has been taken for the digital preservation of an identified part of the collection, with grant support from the National Archive of India. This paper discusses the entire project process starting from the project proposal formulation to the structuring of the digital collection. The paper sequentially discusses the different steps of the entire work of digitization of a collection of 241 old and rare books from the main collection of DHAS.

古书、古旧和珍贵的文件都是易碎的，容易受到不同的危害。长期保存它们是一项真正的挑战。从古代开始，人们就开始通过书写和记录来表达他们的知识，并在后来的时代开始收集和储存这些作为古董材料。这些可以在世界各地不同的博物馆、图书馆、档案馆、个人家庭和其他地方看到。保存和保存这些古董、古老和珍贵的书籍、文件等，对图书馆员、保护人员、保存管理人员或与储存这些相关的人来说是一个挑战。在本文中，详细讨论了印度阿萨姆邦古瓦哈蒂历史和古物研究理事会(DHAS)中可用的此类收藏的数字保存。DHAS是阿萨姆邦政府的一个分支机构，主要任务是收集、保存和研究历史和古物资源。DHAS的收藏是最古老的收藏之一，自1928年以来一直作为阿萨姆邦的研究中心。在印度国家档案馆的资助下，对馆藏中已确定的部分进行了特别的数字保存。本文讨论了从项目提案制定到数字馆藏构建的整个项目过程。本文按顺序论述了DHAS主馆藏241本古本藏书数字化全过程的不同步骤。

{"title":"E-Preservation of Old and Rare Books: A Structured Approach for Creating a Digital Collection","authors":"Sangeeta Chakravarty","doi":"10.2218/ijdc.v17i1.855","DOIUrl":"https://doi.org/10.2218/ijdc.v17i1.855","url":null,"abstract":"Antique books, old and rare documents are fragile and vulnerable to different hazards. Preserving them for an extended period is a real challenge. From ancient times people started expressing their knowledge by writing and keeping records and subsequently started collecting and storing these at later ages as antique materials. These can be seen in different museums, libraries, archives, individual households, and other places all over the world. Preserving and conserving these antique, old and rare books, documents etc. in good condition is a challenge for librarians, conservators, preservation administrators or persons associated with storing these. In this paper, details of the digital preservation of such a collection available in the Directorate of Historical and Antiquarian Studies (DHAS), Guwahati, Assam, India, are discussed. DHAS is a Government of Assam wing and is mainly mandated to collect, preserve and research historical and antiquarian resources. The collection of DHAS is one of the oldest collections and has been serving as a study and research centre in Assam since 1928. A special drive has been taken for the digital preservation of an identified part of the collection, with grant support from the National Archive of India. This paper discusses the entire project process starting from the project proposal formulation to the structuring of the digital collection. The paper sequentially discusses the different steps of the entire work of digitization of a collection of 241 old and rare books from the main collection of DHAS.","PeriodicalId":87279,"journal":{"name":"International journal of digital curation","volume":"1 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134954212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0