使用机器学习预测问题诚意的研究

T. Nguyen, P. Meesad
{"title":"使用机器学习预测问题诚意的研究","authors":"T. Nguyen, P. Meesad","doi":"10.1145/3508230.3508258","DOIUrl":null,"url":null,"abstract":"The growth of applications in both scientific socialism and naturalism causes it increasingly difficult to assess whether a question is sincere or not. It is mandatory for many marketing and financial companies. Many utilizations will be reconfigured beyond recognition, especially text and images, while others face potential extinction as a corollary of advances in technology and computer science in particular. Analyzing text and image data will be truly needed for understanding valuable insights. In this paper, we analyzed the Quora dataset obtained from Kaggle.com to filter insincere and spam content. We used different preprocessing algorithms and analysis models provided in PySpark. Besides, we analyzed the manner of users established in writing their posts via the proposed prediction models. Finally, we showed the most accurate algorithm of the selected algorithms for classifying questions on Quora. The Gradient Boosted Tree was the best model for questions on Quora with an accuracy was 79.5% and followed was Long-Short Term Memory (LSTM) reaching 78.0%. Compared to other methods, the same building in Scikit-Learn and machine learning GRU, BiLSTM, BiGRU, applying models in PySpark could get a better answer in classifying questions on Quora.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Study of Predicting the Sincerity of a Question Asked Using Machine Learning\",\"authors\":\"T. Nguyen, P. Meesad\",\"doi\":\"10.1145/3508230.3508258\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growth of applications in both scientific socialism and naturalism causes it increasingly difficult to assess whether a question is sincere or not. It is mandatory for many marketing and financial companies. Many utilizations will be reconfigured beyond recognition, especially text and images, while others face potential extinction as a corollary of advances in technology and computer science in particular. Analyzing text and image data will be truly needed for understanding valuable insights. In this paper, we analyzed the Quora dataset obtained from Kaggle.com to filter insincere and spam content. We used different preprocessing algorithms and analysis models provided in PySpark. Besides, we analyzed the manner of users established in writing their posts via the proposed prediction models. Finally, we showed the most accurate algorithm of the selected algorithms for classifying questions on Quora. The Gradient Boosted Tree was the best model for questions on Quora with an accuracy was 79.5% and followed was Long-Short Term Memory (LSTM) reaching 78.0%. Compared to other methods, the same building in Scikit-Learn and machine learning GRU, BiLSTM, BiGRU, applying models in PySpark could get a better answer in classifying questions on Quora.\",\"PeriodicalId\":252146,\"journal\":{\"name\":\"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval\",\"volume\":\"91 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508230.3508258\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508230.3508258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

科学社会主义和自然主义的应用日益增多,使得一个问题的真伪越来越难以判断。这是许多营销和金融公司的强制性要求。许多应用将被重新配置,尤其是文本和图像,而其他应用则面临着潜在的灭绝,这是技术进步的必然结果,尤其是计算机科学。分析文本和图像数据对于理解有价值的见解是非常必要的。在本文中,我们分析了从Kaggle.com获得的Quora数据集,以过滤不真诚和垃圾内容。我们使用了PySpark提供的不同预处理算法和分析模型。此外,我们还通过提出的预测模型分析了用户在撰写帖子时的建立方式。最后,我们展示了Quora上的问题分类算法中最准确的算法。梯度提升树是Quora上问题的最佳模型,准确率为79.5%,其次是长短期记忆(LSTM),达到78.0%。与其他方法相比,在Scikit-Learn和机器学习GRU, BiLSTM, BiGRU中同样的构建,在PySpark中应用模型可以在Quora上得到更好的分类问题的答案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Study of Predicting the Sincerity of a Question Asked Using Machine Learning
The growth of applications in both scientific socialism and naturalism causes it increasingly difficult to assess whether a question is sincere or not. It is mandatory for many marketing and financial companies. Many utilizations will be reconfigured beyond recognition, especially text and images, while others face potential extinction as a corollary of advances in technology and computer science in particular. Analyzing text and image data will be truly needed for understanding valuable insights. In this paper, we analyzed the Quora dataset obtained from Kaggle.com to filter insincere and spam content. We used different preprocessing algorithms and analysis models provided in PySpark. Besides, we analyzed the manner of users established in writing their posts via the proposed prediction models. Finally, we showed the most accurate algorithm of the selected algorithms for classifying questions on Quora. The Gradient Boosted Tree was the best model for questions on Quora with an accuracy was 79.5% and followed was Long-Short Term Memory (LSTM) reaching 78.0%. Compared to other methods, the same building in Scikit-Learn and machine learning GRU, BiLSTM, BiGRU, applying models in PySpark could get a better answer in classifying questions on Quora.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Pandemic rumor identification on social networking sites: A case study of COVID-19 Research on Domain Emotion Dictionary Construction Method based on Improved SO-PMI Algorithm Topic Segmentation for Interview Dialogue System Method of Graphical User Interface Adaptation Using Reinforcement Learning and Automated Testing Prediction of Number of Likes and Retweets based on the Features of Tweet Text and Images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1