{"title":"Text duplication of papers in four medical related fields","authors":"Ping Ni, Lianhui Shan, Yong Li, Xinying An","doi":"10.2478/jdis-2023-0024","DOIUrl":null,"url":null,"abstract":"Abstract Purpose To reveal the typical features of text duplication in papers from four medical fields: basic medicine, health management, pharmacology and pharmacy, and public health and preventive medicine. To analyze the reasons for duplication and provide suggestions for the management of medical academic misconduct. Design/methodology/approach In total, 2,469 representative Chinese journal papers were included in our research, which were submitted by researchers in 2020 and 2021. A plagiarism check was carried out using the Academic Misconduct Literature Check System (AMLC). We generated a corrected similarity index based on the AMLC general similarity index for further analysis. We compared the similarity indices of papers in four medical fields and revealed their trends over time; differences in similarity index between review and research articles were also analyzed according to the different fields. Further analysis of 143 papers suspected of plagiarism was also performed from the perspective of sections containing duplication and according to the field of research. Findings Papers in the field of pharmacology and pharmacy had the highest similarity index (8.67 ± 5.92%), which was significantly higher than that in other fields, except health management. The similarity index of review articles (9.77 ± 10.28%) was significantly higher than that of research articles (7.41 ± 6.26%). In total, 143 papers were suspected of plagiarism (5.80%) with similarity indices ≥ 15%; most were papers on health management (78, 54.55%), followed by public health and preventive medicine (38, 26.58%); 90.21% of the 143 papers had duplication in multiple sections, while only 9.79% had duplication in a single section. The distribution of sections with duplication varied among different fields; papers in pharmacology and pharmacy were more likely to have duplication in the data/methods and introduction/background sections, however, papers in health management were more likely to contain duplication in the introduction/background or results/discussion sections. Different structures for papers in different fields may have caused these differences. Research limitations There were three limitations to our research. Firstly, we observed that a small number of papers have been checked early. It is unknown who conducted the plagiarism check as this can be included in other evaluations, such as applications for Science and technology projects or awards. If the authors carried out the check, text with high similarity indices may have been excluded before submission, meaning the similarity index in our research may have been lower than the original value. Secondly, there were only four medical fields included in our research. Additional analysis on a wider scale is required in the future. Thirdly, only a general similarity index was calculated in our study; other similarity indices were not tested. Practical implications A comprehensive analysis of similarity indices in four medical fields was performed. We made several recommendations for the supervision of medical academic misconduct and the formation of criteria for defining suspected plagiarism for medical papers, as well as for the improved accuracy of text duplication checks. Originality/value We quantified the differences between the AMLC general similarity index and the corrected index, described the situation around text duplication and plagiarism in papers from four medical fields, and revealed differences in similarity indices between different article types. We also revealed differences in the sections containing duplication for papers with suspected plagiarism among different fields.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"52 1","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/jdis-2023-0024","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Purpose To reveal the typical features of text duplication in papers from four medical fields: basic medicine, health management, pharmacology and pharmacy, and public health and preventive medicine. To analyze the reasons for duplication and provide suggestions for the management of medical academic misconduct. Design/methodology/approach In total, 2,469 representative Chinese journal papers were included in our research, which were submitted by researchers in 2020 and 2021. A plagiarism check was carried out using the Academic Misconduct Literature Check System (AMLC). We generated a corrected similarity index based on the AMLC general similarity index for further analysis. We compared the similarity indices of papers in four medical fields and revealed their trends over time; differences in similarity index between review and research articles were also analyzed according to the different fields. Further analysis of 143 papers suspected of plagiarism was also performed from the perspective of sections containing duplication and according to the field of research. Findings Papers in the field of pharmacology and pharmacy had the highest similarity index (8.67 ± 5.92%), which was significantly higher than that in other fields, except health management. The similarity index of review articles (9.77 ± 10.28%) was significantly higher than that of research articles (7.41 ± 6.26%). In total, 143 papers were suspected of plagiarism (5.80%) with similarity indices ≥ 15%; most were papers on health management (78, 54.55%), followed by public health and preventive medicine (38, 26.58%); 90.21% of the 143 papers had duplication in multiple sections, while only 9.79% had duplication in a single section. The distribution of sections with duplication varied among different fields; papers in pharmacology and pharmacy were more likely to have duplication in the data/methods and introduction/background sections, however, papers in health management were more likely to contain duplication in the introduction/background or results/discussion sections. Different structures for papers in different fields may have caused these differences. Research limitations There were three limitations to our research. Firstly, we observed that a small number of papers have been checked early. It is unknown who conducted the plagiarism check as this can be included in other evaluations, such as applications for Science and technology projects or awards. If the authors carried out the check, text with high similarity indices may have been excluded before submission, meaning the similarity index in our research may have been lower than the original value. Secondly, there were only four medical fields included in our research. Additional analysis on a wider scale is required in the future. Thirdly, only a general similarity index was calculated in our study; other similarity indices were not tested. Practical implications A comprehensive analysis of similarity indices in four medical fields was performed. We made several recommendations for the supervision of medical academic misconduct and the formation of criteria for defining suspected plagiarism for medical papers, as well as for the improved accuracy of text duplication checks. Originality/value We quantified the differences between the AMLC general similarity index and the corrected index, described the situation around text duplication and plagiarism in papers from four medical fields, and revealed differences in similarity indices between different article types. We also revealed differences in the sections containing duplication for papers with suspected plagiarism among different fields.
期刊介绍:
JDIS devotes itself to the study and application of the theories, methods, techniques, services, infrastructural facilities using big data to support knowledge discovery for decision & policy making. The basic emphasis is big data-based, analytics centered, knowledge discovery driven, and decision making supporting. The special effort is on the knowledge discovery to detect and predict structures, trends, behaviors, relations, evolutions and disruptions in research, innovation, business, politics, security, media and communications, and social development, where the big data may include metadata or full content data, text or non-textural data, structured or non-structural data, domain specific or cross-domain data, and dynamic or interactive data.
The main areas of interest are:
(1) New theories, methods, and techniques of big data based data mining, knowledge discovery, and informatics, including but not limited to scientometrics, communication analysis, social network analysis, tech & industry analysis, competitive intelligence, knowledge mapping, evidence based policy analysis, and predictive analysis.
(2) New methods, architectures, and facilities to develop or improve knowledge infrastructure capable to support knowledge organization and sophisticated analytics, including but not limited to ontology construction, knowledge organization, semantic linked data, knowledge integration and fusion, semantic retrieval, domain specific knowledge infrastructure, and semantic sciences.
(3) New mechanisms, methods, and tools to embed knowledge analytics and knowledge discovery into actual operation, service, or managerial processes, including but not limited to knowledge assisted scientific discovery, data mining driven intelligent workflows in learning, communications, and management.
Specific topic areas may include:
Knowledge organization
Knowledge discovery and data mining
Knowledge integration and fusion
Semantic Web metrics
Scientometrics
Analytic and diagnostic informetrics
Competitive intelligence
Predictive analysis
Social network analysis and metrics
Semantic and interactively analytic retrieval
Evidence-based policy analysis
Intelligent knowledge production
Knowledge-driven workflow management and decision-making
Knowledge-driven collaboration and its management
Domain knowledge infrastructure with knowledge fusion and analytics
Development of data and information services