连接:利用拓扑数据分析从摘要中检测欺诈

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA) Pub Date : 2021-12-01 DOI:10.1109/ICMLA52953.2021.00069

Sarah Tymochko, Julien Chaput, T. Doster, Emilie Purvine, Jackson Warley, T. Emerson

{"title":"连接:利用拓扑数据分析从摘要中检测欺诈","authors":"Sarah Tymochko, Julien Chaput, T. Doster, Emilie Purvine, Jackson Warley, T. Emerson","doi":"10.1109/ICMLA52953.2021.00069","DOIUrl":null,"url":null,"abstract":"In this paper we present a novel approach for identifying fraudulent papers from their titles and abstracts. The premise of the approach is that there are holes in the presentation of the approach and findings of fraudulent research papers. As an abstract is intended to highlight key features of the approach as well as important conclusions the authors seek to determine if the assumed existence of holes can be identified from analysis of abstracts alone. The data set considered is derived from papers sharing a single author with labels determined based on a formal linguistic analysis of the complete documents. To detect these logical and literary holes we utilize techniques from topological data analysis which summarizes data based on the presence of multi-dimensional, topological holes. We find that, in fact, topological features derived through a combination of techniques in natural language processing and time-series analysis allow for superior detection of the fraudulent papers than the natural language processing tools alone. Thus we conclude that the connections and holes present in the abstracts of research cons contributes to an ability to infer the scientific validity of the corresponding work.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"92 1","pages":"403-408"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Con Connections: Detecting Fraud from Abstracts using Topological Data Analysis\",\"authors\":\"Sarah Tymochko, Julien Chaput, T. Doster, Emilie Purvine, Jackson Warley, T. Emerson\",\"doi\":\"10.1109/ICMLA52953.2021.00069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present a novel approach for identifying fraudulent papers from their titles and abstracts. The premise of the approach is that there are holes in the presentation of the approach and findings of fraudulent research papers. As an abstract is intended to highlight key features of the approach as well as important conclusions the authors seek to determine if the assumed existence of holes can be identified from analysis of abstracts alone. The data set considered is derived from papers sharing a single author with labels determined based on a formal linguistic analysis of the complete documents. To detect these logical and literary holes we utilize techniques from topological data analysis which summarizes data based on the presence of multi-dimensional, topological holes. We find that, in fact, topological features derived through a combination of techniques in natural language processing and time-series analysis allow for superior detection of the fraudulent papers than the natural language processing tools alone. Thus we conclude that the connections and holes present in the abstracts of research cons contributes to an ability to infer the scientific validity of the corresponding work.\",\"PeriodicalId\":6750,\"journal\":{\"name\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"92 1\",\"pages\":\"403-408\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA52953.2021.00069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在本文中，我们提出了一种从标题和摘要中识别欺诈性论文的新方法。该方法的前提是该方法的呈现和欺诈性研究论文的发现存在漏洞。由于摘要旨在突出该方法的关键特征以及重要结论，因此作者试图确定是否可以仅从摘要分析中识别假设存在的漏洞。所考虑的数据集来源于共享单个作者的论文，其标签是根据对完整文档的正式语言分析确定的。为了检测这些逻辑和文字漏洞，我们利用拓扑数据分析技术，该技术基于多维拓扑漏洞的存在来总结数据。我们发现，事实上，通过自然语言处理和时间序列分析技术相结合得出的拓扑特征比单独使用自然语言处理工具更能检测出欺诈性论文。因此，我们得出结论，研究摘要中存在的联系和漏洞有助于推断相应工作的科学有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Con Connections: Detecting Fraud from Abstracts using Topological Data Analysis

In this paper we present a novel approach for identifying fraudulent papers from their titles and abstracts. The premise of the approach is that there are holes in the presentation of the approach and findings of fraudulent research papers. As an abstract is intended to highlight key features of the approach as well as important conclusions the authors seek to determine if the assumed existence of holes can be identified from analysis of abstracts alone. The data set considered is derived from papers sharing a single author with labels determined based on a formal linguistic analysis of the complete documents. To detect these logical and literary holes we utilize techniques from topological data analysis which summarizes data based on the presence of multi-dimensional, topological holes. We find that, in fact, topological features derived through a combination of techniques in natural language processing and time-series analysis allow for superior detection of the fraudulent papers than the natural language processing tools alone. Thus we conclude that the connections and holes present in the abstracts of research cons contributes to an ability to infer the scientific validity of the corresponding work.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)

自引率

0.00%

发文量

期刊最新文献

Detecting Offensive Content on Twitter During Proud Boys Riots Explainable Zero-Shot Modelling of Clinical Depression Symptoms from Text Deep Learning Methods for the Prediction of Information Display Type Using Eye Tracking Sequences Step Detection using SVM on NURVV Trackers Condition Monitoring for Power Converters via Deep One-Class Classification