Sarah Tymochko, Julien Chaput, T. Doster, Emilie Purvine, Jackson Warley, T. Emerson
{"title":"连接:利用拓扑数据分析从摘要中检测欺诈","authors":"Sarah Tymochko, Julien Chaput, T. Doster, Emilie Purvine, Jackson Warley, T. Emerson","doi":"10.1109/ICMLA52953.2021.00069","DOIUrl":null,"url":null,"abstract":"In this paper we present a novel approach for identifying fraudulent papers from their titles and abstracts. The premise of the approach is that there are holes in the presentation of the approach and findings of fraudulent research papers. As an abstract is intended to highlight key features of the approach as well as important conclusions the authors seek to determine if the assumed existence of holes can be identified from analysis of abstracts alone. The data set considered is derived from papers sharing a single author with labels determined based on a formal linguistic analysis of the complete documents. To detect these logical and literary holes we utilize techniques from topological data analysis which summarizes data based on the presence of multi-dimensional, topological holes. We find that, in fact, topological features derived through a combination of techniques in natural language processing and time-series analysis allow for superior detection of the fraudulent papers than the natural language processing tools alone. Thus we conclude that the connections and holes present in the abstracts of research cons contributes to an ability to infer the scientific validity of the corresponding work.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"92 1","pages":"403-408"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Con Connections: Detecting Fraud from Abstracts using Topological Data Analysis\",\"authors\":\"Sarah Tymochko, Julien Chaput, T. Doster, Emilie Purvine, Jackson Warley, T. Emerson\",\"doi\":\"10.1109/ICMLA52953.2021.00069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present a novel approach for identifying fraudulent papers from their titles and abstracts. The premise of the approach is that there are holes in the presentation of the approach and findings of fraudulent research papers. As an abstract is intended to highlight key features of the approach as well as important conclusions the authors seek to determine if the assumed existence of holes can be identified from analysis of abstracts alone. The data set considered is derived from papers sharing a single author with labels determined based on a formal linguistic analysis of the complete documents. To detect these logical and literary holes we utilize techniques from topological data analysis which summarizes data based on the presence of multi-dimensional, topological holes. We find that, in fact, topological features derived through a combination of techniques in natural language processing and time-series analysis allow for superior detection of the fraudulent papers than the natural language processing tools alone. Thus we conclude that the connections and holes present in the abstracts of research cons contributes to an ability to infer the scientific validity of the corresponding work.\",\"PeriodicalId\":6750,\"journal\":{\"name\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"92 1\",\"pages\":\"403-408\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA52953.2021.00069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA52953.2021.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Con Connections: Detecting Fraud from Abstracts using Topological Data Analysis
In this paper we present a novel approach for identifying fraudulent papers from their titles and abstracts. The premise of the approach is that there are holes in the presentation of the approach and findings of fraudulent research papers. As an abstract is intended to highlight key features of the approach as well as important conclusions the authors seek to determine if the assumed existence of holes can be identified from analysis of abstracts alone. The data set considered is derived from papers sharing a single author with labels determined based on a formal linguistic analysis of the complete documents. To detect these logical and literary holes we utilize techniques from topological data analysis which summarizes data based on the presence of multi-dimensional, topological holes. We find that, in fact, topological features derived through a combination of techniques in natural language processing and time-series analysis allow for superior detection of the fraudulent papers than the natural language processing tools alone. Thus we conclude that the connections and holes present in the abstracts of research cons contributes to an ability to infer the scientific validity of the corresponding work.