Siamese Long Short-Term Memory for Detecting Conflict of Interest on Scientific Papers

IPTEK: The Journal for Technology and Science Pub Date : 2019-07-26 DOI:10.12962/j20882033.v30i2.5008

Akhmad Bakhrul Ilmi, D. Purwitasari, C. Fatichah

{"title":"Siamese Long Short-Term Memory for Detecting Conflict of Interest on Scientific Papers","authors":"Akhmad Bakhrul Ilmi, D. Purwitasari, C. Fatichah","doi":"10.12962/j20882033.v30i2.5008","DOIUrl":null,"url":null,"abstract":"Scientific articles cited by other researchers have an impact on increasing author credibility. However, the citation process may be misused to unnaturally raise a bibliometric indicator value such as researcher’s h-index. Researchers may overly cites their own works, referred as self-citation, even though the topic of the references are not related to the current article. Further misconduct is excessive citations on the works of peoples related to the researcher which can be coercive or not, referred as conflict of interest (CoI). The proposed method uses a deep learning approach, Siamese Long ShortTerm Memory (LSTM), to recognize subject similarities between a scientific article and its references. Standard text similarity fails to do so because contextual relatedness of sentences in the articles need some learning process. Siamese-LSTM learns contextual relatedness of sentences in the article using two identical LSTM. Steps of the proposed method are (i) wordembedding to get weight values of terms but still considers their semantic relations, (ii) k-means clustering to generate training data for reducing time complexity in Siamese-LSTM learning of scientific articles, (iii) learns Siamese-LSTM weight from training data to identify contextual relatedness of sentences, (iv) calculate similarity of a scientific article with its references based on Siamese-LSTM. The empirical experiments are used to analyze similarity values and the possibility for conflict of interest in an article. KeywordsCitation, Conflict of Interest, Scientific Text, Deep Learning, Similarity, Text Processing.","PeriodicalId":14549,"journal":{"name":"IPTEK: The Journal for Technology and Science","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPTEK: The Journal for Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12962/j20882033.v30i2.5008","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Scientific articles cited by other researchers have an impact on increasing author credibility. However, the citation process may be misused to unnaturally raise a bibliometric indicator value such as researcher’s h-index. Researchers may overly cites their own works, referred as self-citation, even though the topic of the references are not related to the current article. Further misconduct is excessive citations on the works of peoples related to the researcher which can be coercive or not, referred as conflict of interest (CoI). The proposed method uses a deep learning approach, Siamese Long ShortTerm Memory (LSTM), to recognize subject similarities between a scientific article and its references. Standard text similarity fails to do so because contextual relatedness of sentences in the articles need some learning process. Siamese-LSTM learns contextual relatedness of sentences in the article using two identical LSTM. Steps of the proposed method are (i) wordembedding to get weight values of terms but still considers their semantic relations, (ii) k-means clustering to generate training data for reducing time complexity in Siamese-LSTM learning of scientific articles, (iii) learns Siamese-LSTM weight from training data to identify contextual relatedness of sentences, (iv) calculate similarity of a scientific article with its references based on Siamese-LSTM. The empirical experiments are used to analyze similarity values and the possibility for conflict of interest in an article. KeywordsCitation, Conflict of Interest, Scientific Text, Deep Learning, Similarity, Text Processing.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用连体长短期记忆检测科学论文中的利益冲突

被其他研究人员引用的科学文章对提高作者的可信度有影响。然而，引文过程可能被滥用，以不自然地提高文献计量指标值，如研究者的h指数。研究人员可能会过度引用自己的作品，被称为自引，即使参考文献的主题与当前文章无关。进一步的不当行为是过度引用与研究人员有关的人的作品，这可能是强制性的，也可能不是，称为利益冲突(CoI)。提出的方法使用深度学习方法，暹罗长短期记忆(LSTM)，以识别科学文章及其参考文献之间的主题相似性。标准文本相似度无法做到这一点，因为文章中句子的语境相关性需要一定的学习过程。siame -LSTM使用两个相同的LSTM学习文章中句子的上下文相关性。本文提出的方法的步骤是(i)在考虑其语义关系的情况下，对术语进行词嵌入，获得其权重值;(ii) k-means聚类，生成训练数据，降低科学文章暹罗- lstm学习的时间复杂度;(iii)从训练数据中学习暹罗- lstm权重，识别句子的上下文相关性;(iv)基于暹罗- lstm计算科学文章与其参考文献的相似度。通过实证实验分析了文章的相似度值和利益冲突的可能性。关键词引文，利益冲突，科学文本，深度学习，相似度，文本处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IPTEK: The Journal for Technology and Science

自引率

0.00%

发文量

审稿时长

9 weeks