Franciskus Antonius, Myagmarsuren Orosoo, Aanandha Saravanan K, Indrajit Patra, Prema S
{"title":"通过高级自然语言处理和Smith-Waterman算法的E-BERT框架增强抄袭检测","authors":"Franciskus Antonius, Myagmarsuren Orosoo, Aanandha Saravanan K, Indrajit Patra, Prema S","doi":"10.14569/ijacsa.2023.0140944","DOIUrl":null,"url":null,"abstract":"Effective detection has been extremely difficult due to plagiarism's pervasiveness throughout a variety of fields, including academia and research. Increasingly complex plagiarism detection strategies are being used by people, making traditional approaches ineffective. The assessment of plagiarism involves a comprehensive examination encompassing syntactic, lexical, semantic, and structural facets. In contrast to traditional string-matching techniques, this investigation adopts a sophisticated Natural Language Processing (NLP) framework. The preprocessing phase entails a series of intricate steps ultimately refining the raw text data. The crux of this methodology lies in the integration of two distinct metrics within the Encoder Representation from Transformers (E-BERT) approach, effectively facilitating a granular exploration of textual similarity. Within the realm of NLP, the amalgamation of Deep and Shallow approaches serves as a lens to delve into the intricate nuances of the text, uncovering underlying layers of meaning. The discerning outcomes of this research unveil the remarkable proficiency of Deep NLP in promptly identifying substantial revisions. Integral to this innovation is the novel utilization of the Waterman algorithm and an English-Spanish dictionary, which contribute to the selection of optimal attributes. Comparative evaluations against alternative models employing distinct encoding methodologies, along with logistic regression as a classifier underscore the potency of the proposed implementation. The culmination of extensive experimentation substantiates the system's prowess, boasting an impressive 99.5% accuracy rate in extracting instances of plagiarism. This research serves as a pivotal advancement in the domain of plagiarism detection, ushering in effective and sophisticated methods to combat the growing spectre of unoriginal content.","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":"7 1","pages":"0"},"PeriodicalIF":0.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced Plagiarism Detection Through Advanced Natural Language Processing and E-BERT Framework of the Smith-Waterman Algorithm\",\"authors\":\"Franciskus Antonius, Myagmarsuren Orosoo, Aanandha Saravanan K, Indrajit Patra, Prema S\",\"doi\":\"10.14569/ijacsa.2023.0140944\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Effective detection has been extremely difficult due to plagiarism's pervasiveness throughout a variety of fields, including academia and research. Increasingly complex plagiarism detection strategies are being used by people, making traditional approaches ineffective. The assessment of plagiarism involves a comprehensive examination encompassing syntactic, lexical, semantic, and structural facets. In contrast to traditional string-matching techniques, this investigation adopts a sophisticated Natural Language Processing (NLP) framework. The preprocessing phase entails a series of intricate steps ultimately refining the raw text data. The crux of this methodology lies in the integration of two distinct metrics within the Encoder Representation from Transformers (E-BERT) approach, effectively facilitating a granular exploration of textual similarity. Within the realm of NLP, the amalgamation of Deep and Shallow approaches serves as a lens to delve into the intricate nuances of the text, uncovering underlying layers of meaning. The discerning outcomes of this research unveil the remarkable proficiency of Deep NLP in promptly identifying substantial revisions. Integral to this innovation is the novel utilization of the Waterman algorithm and an English-Spanish dictionary, which contribute to the selection of optimal attributes. Comparative evaluations against alternative models employing distinct encoding methodologies, along with logistic regression as a classifier underscore the potency of the proposed implementation. The culmination of extensive experimentation substantiates the system's prowess, boasting an impressive 99.5% accuracy rate in extracting instances of plagiarism. This research serves as a pivotal advancement in the domain of plagiarism detection, ushering in effective and sophisticated methods to combat the growing spectre of unoriginal content.\",\"PeriodicalId\":13824,\"journal\":{\"name\":\"International Journal of Advanced Computer Science and Applications\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advanced Computer Science and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14569/ijacsa.2023.0140944\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Computer Science and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14569/ijacsa.2023.0140944","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Enhanced Plagiarism Detection Through Advanced Natural Language Processing and E-BERT Framework of the Smith-Waterman Algorithm
Effective detection has been extremely difficult due to plagiarism's pervasiveness throughout a variety of fields, including academia and research. Increasingly complex plagiarism detection strategies are being used by people, making traditional approaches ineffective. The assessment of plagiarism involves a comprehensive examination encompassing syntactic, lexical, semantic, and structural facets. In contrast to traditional string-matching techniques, this investigation adopts a sophisticated Natural Language Processing (NLP) framework. The preprocessing phase entails a series of intricate steps ultimately refining the raw text data. The crux of this methodology lies in the integration of two distinct metrics within the Encoder Representation from Transformers (E-BERT) approach, effectively facilitating a granular exploration of textual similarity. Within the realm of NLP, the amalgamation of Deep and Shallow approaches serves as a lens to delve into the intricate nuances of the text, uncovering underlying layers of meaning. The discerning outcomes of this research unveil the remarkable proficiency of Deep NLP in promptly identifying substantial revisions. Integral to this innovation is the novel utilization of the Waterman algorithm and an English-Spanish dictionary, which contribute to the selection of optimal attributes. Comparative evaluations against alternative models employing distinct encoding methodologies, along with logistic regression as a classifier underscore the potency of the proposed implementation. The culmination of extensive experimentation substantiates the system's prowess, boasting an impressive 99.5% accuracy rate in extracting instances of plagiarism. This research serves as a pivotal advancement in the domain of plagiarism detection, ushering in effective and sophisticated methods to combat the growing spectre of unoriginal content.
期刊介绍:
IJACSA is a scholarly computer science journal representing the best in research. Its mission is to provide an outlet for quality research to be publicised and published to a global audience. The journal aims to publish papers selected through rigorous double-blind peer review to ensure originality, timeliness, relevance, and readability. In sync with the Journal''s vision "to be a respected publication that publishes peer reviewed research articles, as well as review and survey papers contributed by International community of Authors", we have drawn reviewers and editors from Institutions and Universities across the globe. A double blind peer review process is conducted to ensure that we retain high standards. At IJACSA, we stand strong because we know that global challenges make way for new innovations, new ways and new talent. International Journal of Advanced Computer Science and Applications publishes carefully refereed research, review and survey papers which offer a significant contribution to the computer science literature, and which are of interest to a wide audience. Coverage extends to all main-stream branches of computer science and related applications