Mining online political opinion surveys for suspect entries: An interdisciplinary comparison

Costantinos Djouvas , Fernando Mendez , Nicolas Tsapatsoulis
{"title":"Mining online political opinion surveys for suspect entries: An interdisciplinary comparison","authors":"Costantinos Djouvas ,&nbsp;Fernando Mendez ,&nbsp;Nicolas Tsapatsoulis","doi":"10.1016/j.jides.2016.11.003","DOIUrl":null,"url":null,"abstract":"<div><p>Filtering data generated by so-called Voting Advice Applications (VAAs) in order to remove entries that exhibit unrealistic behavior (i.e., cannot correspond to a real political view) is of primary importance. If such entries are significantly present in VAA generated datasets, they can render conclusions drawn from VAA data analysis invalid. In this work we investigate approaches that can be used for automating the process of identifying entries that appear to be suspicious in terms of a users’ answer patterns. We utilize two unsupervised data mining techniques and compare their performance against a well established psychometric approach. Our results suggest that the performance of data mining approaches is comparable to those drawing on psychometric theory with a fraction of the complexity. More specifically, our simulations show that data mining techniques as well as psychometric approaches can be used to identify truly ‘rogue’ data (i.e., completely random data injected into the dataset under investigation). However, when analysing real datasets the performance of all approaches dropped considerably. This suggests that ‘suspect’ entries are neither random nor clustered. This finding poses some limitations on the use of unsupervised techniques, suggesting that the latter can only complement rather than substitute existing methods to identifying suspicious entries.</p></div>","PeriodicalId":100792,"journal":{"name":"Journal of Innovation in Digital Ecosystems","volume":"3 2","pages":"Pages 172-182"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.jides.2016.11.003","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Innovation in Digital Ecosystems","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352664516300256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Filtering data generated by so-called Voting Advice Applications (VAAs) in order to remove entries that exhibit unrealistic behavior (i.e., cannot correspond to a real political view) is of primary importance. If such entries are significantly present in VAA generated datasets, they can render conclusions drawn from VAA data analysis invalid. In this work we investigate approaches that can be used for automating the process of identifying entries that appear to be suspicious in terms of a users’ answer patterns. We utilize two unsupervised data mining techniques and compare their performance against a well established psychometric approach. Our results suggest that the performance of data mining approaches is comparable to those drawing on psychometric theory with a fraction of the complexity. More specifically, our simulations show that data mining techniques as well as psychometric approaches can be used to identify truly ‘rogue’ data (i.e., completely random data injected into the dataset under investigation). However, when analysing real datasets the performance of all approaches dropped considerably. This suggests that ‘suspect’ entries are neither random nor clustered. This finding poses some limitations on the use of unsupervised techniques, suggesting that the latter can only complement rather than substitute existing methods to identifying suspicious entries.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从可疑条目中挖掘在线政治民意调查:跨学科比较
过滤由所谓的投票建议应用程序(VAAs)生成的数据,以删除那些表现出不现实行为(即不能符合真实政治观点)的条目,这是至关重要的。如果这些条目在VAA生成的数据集中大量存在,它们会使VAA数据分析得出的结论无效。在这项工作中,我们研究了可用于自动识别在用户回答模式方面似乎可疑的条目的过程的方法。我们利用两种无监督数据挖掘技术,并将其性能与一种成熟的心理测量方法进行比较。我们的研究结果表明,数据挖掘方法的性能可以与那些利用心理测量理论的方法相媲美,只是复杂性的一小部分。更具体地说,我们的模拟表明,数据挖掘技术以及心理测量方法可以用来识别真正的“流氓”数据(即,完全随机的数据注入到正在调查的数据集中)。然而,当分析真实数据集时,所有方法的性能都大幅下降。这表明“可疑”条目既不是随机的,也不是聚集的。这一发现对使用无监督技术提出了一些限制,表明后者只能补充而不是替代现有的方法来识别可疑条目。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Preface Meaning-based machine learning for information assurance Wavelet decomposition of software entropy reveals symptoms of malicious code Evaluating the descriptive power of Instagram hashtags Occupancy driven building performance assessment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1