Examining and detecting academic misconduct in written documents using revision save identifier numbers in MS Word as exemplified by multiple scenarios
Dirk HR. Spennemann , Rudolf J. Spennemann , Clare L. Singh
{"title":"Examining and detecting academic misconduct in written documents using revision save identifier numbers in MS Word as exemplified by multiple scenarios","authors":"Dirk HR. Spennemann , Rudolf J. Spennemann , Clare L. Singh","doi":"10.1016/j.fsidi.2024.301821","DOIUrl":null,"url":null,"abstract":"<div><p>Deliberate academic misconduct by students often relies on the use of segments of externally authored text, generated either by commercial contract authoring services or by generative Artificial intelligence language models. While revision save identifier (rsid) numbers in Microsoft Word files are associated with edit and save actions of a document, MS Word does not adhere to the ECMA specifications for the Office Open XML. Existing literature shows that digital forensics using rsid requires access to multiple document versions or the user's machine. In cases of academic misconduct allegations usually only the submitted files are available for digital forensic examination, coupled with assertions by the alleged perpetrators about the document generation and editing process This paper represents a detailed exploratory study that provides educators and digital forensic scientists with tools to examine a single document for the veracity of various commonly asserted scenarios of document generation and editing. It is based on a series of experiments that ascertained whether and how common edit and document generation actions such as copy, paste, insertion of blocks of texts from other documents, leave tell-tale traces in the rsid encoding that is embedded in all MS Word documents. While digital forensics can illuminate document generation processes, the actions that led to these may have innocuous explanations. In consequence, this paper also provides academic misconduct investigators with a set of prompts to guide the interview with alleged perpetrators to glean the information required for cross-correlation with observations based on the rsid data.</p></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"51 ","pages":"Article 301821"},"PeriodicalIF":2.0000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666281724001458/pdfft?md5=1c46f6d9d5928150f3f10e0b2c0b28f0&pid=1-s2.0-S2666281724001458-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281724001458","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Deliberate academic misconduct by students often relies on the use of segments of externally authored text, generated either by commercial contract authoring services or by generative Artificial intelligence language models. While revision save identifier (rsid) numbers in Microsoft Word files are associated with edit and save actions of a document, MS Word does not adhere to the ECMA specifications for the Office Open XML. Existing literature shows that digital forensics using rsid requires access to multiple document versions or the user's machine. In cases of academic misconduct allegations usually only the submitted files are available for digital forensic examination, coupled with assertions by the alleged perpetrators about the document generation and editing process This paper represents a detailed exploratory study that provides educators and digital forensic scientists with tools to examine a single document for the veracity of various commonly asserted scenarios of document generation and editing. It is based on a series of experiments that ascertained whether and how common edit and document generation actions such as copy, paste, insertion of blocks of texts from other documents, leave tell-tale traces in the rsid encoding that is embedded in all MS Word documents. While digital forensics can illuminate document generation processes, the actions that led to these may have innocuous explanations. In consequence, this paper also provides academic misconduct investigators with a set of prompts to guide the interview with alleged perpetrators to glean the information required for cross-correlation with observations based on the rsid data.
使用 MS Word 中的修订保存标识符编号检查和检测书面文件中的学术不端行为,并通过多种情景加以说明
学生蓄意的学术不端行为往往依赖于使用外部撰写的文本片段,这些片段由商业合同撰写服务或生成式人工智能语言模型生成。虽然 Microsoft Word 文件中的修订保存标识符(rsid)编号与文档的编辑和保存操作相关联,但 MS Word 并不遵循 Office Open XML 的 ECMA 规范。现有文献表明,使用 rsid 进行数字取证需要访问多个文档版本或用户机器。在学术不端指控案件中,通常只有提交的文件可供数字取证检查,再加上被指控的肇事者对文档生成和编辑过程的断言,本文是一项详细的探索性研究,为教育工作者和数字取证科学家提供了检查单个文档的工具,以确定各种常见的文档生成和编辑情况的真实性。该研究基于一系列实验,以确定常见的编辑和文档生成操作(如复制、粘贴、插入其他文档中的文本块)是否以及如何在嵌入所有 MS Word 文档的 rsid 编码中留下蛛丝马迹。虽然数字取证可以揭示文档的生成过程,但导致这些过程的操作可能有无害的解释。因此,本文还为学术不端行为调查人员提供了一套提示,用于指导对涉嫌犯罪者的访谈,以收集所需的信息,并与基于 rsid 数据的观察结果进行交叉关联。