Robustness of Word and Character N-gram Combinations in Detecting Deceptive and Truthful Opinions

A. Siagian, M. Aritsugi
{"title":"Robustness of Word and Character N-gram Combinations in Detecting Deceptive and Truthful Opinions","authors":"A. Siagian, M. Aritsugi","doi":"10.1145/3349536","DOIUrl":null,"url":null,"abstract":"Opinions in reviews about the quality of products or services can be important information for readers. Unfortunately, such opinions may include deceptive ones posted for some business reasons. To keep the opinions as a valuable and trusted source of information, we propose an approach to detecting deceptive and truthful opinions. Specifically, we explore the use of word and character n-gram combinations, function words, and word syntactic n-grams (word sn-grams) as features for classifiers to deal with this task. We also consider applying word correction to our utilized dataset. Our experiments show that classification results of using the word and character n-gram combination features could perform better than those of employing other features. Although the experiments indicate that applying the word correction might be insignificant, we note that the deceptive opinions tend to have a smaller number of error words than the truthful ones. To examine robustness of our features, we then perform cross-classification tests. Our latter experiments results suggest that using the word and character n-gram combination features could work well in detecting deceptive and truthful opinions. Interestingly, the latter experimental results also indicate that using the word sn-grams as combination features could give good performance.","PeriodicalId":15582,"journal":{"name":"Journal of Data and Information Quality (JDIQ)","volume":"23 1","pages":"1 - 24"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Quality (JDIQ)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3349536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Opinions in reviews about the quality of products or services can be important information for readers. Unfortunately, such opinions may include deceptive ones posted for some business reasons. To keep the opinions as a valuable and trusted source of information, we propose an approach to detecting deceptive and truthful opinions. Specifically, we explore the use of word and character n-gram combinations, function words, and word syntactic n-grams (word sn-grams) as features for classifiers to deal with this task. We also consider applying word correction to our utilized dataset. Our experiments show that classification results of using the word and character n-gram combination features could perform better than those of employing other features. Although the experiments indicate that applying the word correction might be insignificant, we note that the deceptive opinions tend to have a smaller number of error words than the truthful ones. To examine robustness of our features, we then perform cross-classification tests. Our latter experiments results suggest that using the word and character n-gram combination features could work well in detecting deceptive and truthful opinions. Interestingly, the latter experimental results also indicate that using the word sn-grams as combination features could give good performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
单词和字符n图组合在检测欺骗性和真实意见中的鲁棒性
评论中关于产品或服务质量的意见对读者来说可能是重要的信息。不幸的是,这些观点可能包括出于某些商业原因而发布的欺骗性观点。为了使这些意见成为有价值和可信的信息来源,我们提出了一种检测欺骗性和真实意见的方法。具体来说,我们探索了使用单词和字符n-gram组合、虚词和单词语法n-gram(单词n-gram)作为分类器处理此任务的特征。我们还考虑对我们使用的数据集应用单词校正。我们的实验表明,使用单词和字符n图组合特征的分类结果优于使用其他特征的分类结果。虽然实验表明,使用单词更正可能是微不足道的,但我们注意到,欺骗性的观点往往比真实的观点有更少的错误词汇。为了检验我们的特征的稳健性,我们执行交叉分类测试。我们后来的实验结果表明,使用单词和字符n-gram组合特征可以很好地检测欺骗性和真实的意见。有趣的是,后者的实验结果也表明,使用单词sn-grams作为组合特征可以获得良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Editorial: Special Issue on Data Transparency—Data Quality, Annotation, and Provenance Challenge Paper: The Vision for Time Profiled Temporal Association Mining Editorial: Special Issue on Quality Assessment and Management in Big Data—Part I Developing a Global Data Breach Database and the Challenges Encountered Knowledge Transfer for Entity Resolution with Siamese Neural Networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1