Con Connections: Detecting Fraud from Abstracts using Topological Data Analysis

Sarah Tymochko, Julien Chaput, T. Doster, Emilie Purvine, Jackson Warley, T. Emerson
{"title":"Con Connections: Detecting Fraud from Abstracts using Topological Data Analysis","authors":"Sarah Tymochko, Julien Chaput, T. Doster, Emilie Purvine, Jackson Warley, T. Emerson","doi":"10.1109/ICMLA52953.2021.00069","DOIUrl":null,"url":null,"abstract":"In this paper we present a novel approach for identifying fraudulent papers from their titles and abstracts. The premise of the approach is that there are holes in the presentation of the approach and findings of fraudulent research papers. As an abstract is intended to highlight key features of the approach as well as important conclusions the authors seek to determine if the assumed existence of holes can be identified from analysis of abstracts alone. The data set considered is derived from papers sharing a single author with labels determined based on a formal linguistic analysis of the complete documents. To detect these logical and literary holes we utilize techniques from topological data analysis which summarizes data based on the presence of multi-dimensional, topological holes. We find that, in fact, topological features derived through a combination of techniques in natural language processing and time-series analysis allow for superior detection of the fraudulent papers than the natural language processing tools alone. Thus we conclude that the connections and holes present in the abstracts of research cons contributes to an ability to infer the scientific validity of the corresponding work.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"92 1","pages":"403-408"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this paper we present a novel approach for identifying fraudulent papers from their titles and abstracts. The premise of the approach is that there are holes in the presentation of the approach and findings of fraudulent research papers. As an abstract is intended to highlight key features of the approach as well as important conclusions the authors seek to determine if the assumed existence of holes can be identified from analysis of abstracts alone. The data set considered is derived from papers sharing a single author with labels determined based on a formal linguistic analysis of the complete documents. To detect these logical and literary holes we utilize techniques from topological data analysis which summarizes data based on the presence of multi-dimensional, topological holes. We find that, in fact, topological features derived through a combination of techniques in natural language processing and time-series analysis allow for superior detection of the fraudulent papers than the natural language processing tools alone. Thus we conclude that the connections and holes present in the abstracts of research cons contributes to an ability to infer the scientific validity of the corresponding work.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
连接:利用拓扑数据分析从摘要中检测欺诈
在本文中,我们提出了一种从标题和摘要中识别欺诈性论文的新方法。该方法的前提是该方法的呈现和欺诈性研究论文的发现存在漏洞。由于摘要旨在突出该方法的关键特征以及重要结论,因此作者试图确定是否可以仅从摘要分析中识别假设存在的漏洞。所考虑的数据集来源于共享单个作者的论文,其标签是根据对完整文档的正式语言分析确定的。为了检测这些逻辑和文字漏洞,我们利用拓扑数据分析技术,该技术基于多维拓扑漏洞的存在来总结数据。我们发现,事实上,通过自然语言处理和时间序列分析技术相结合得出的拓扑特征比单独使用自然语言处理工具更能检测出欺诈性论文。因此,我们得出结论,研究摘要中存在的联系和漏洞有助于推断相应工作的科学有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Detecting Offensive Content on Twitter During Proud Boys Riots Explainable Zero-Shot Modelling of Clinical Depression Symptoms from Text Deep Learning Methods for the Prediction of Information Display Type Using Eye Tracking Sequences Step Detection using SVM on NURVV Trackers Condition Monitoring for Power Converters via Deep One-Class Classification
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1