利用机器学习方法检测 Youtube 视频评论中的垃圾信息

Machine learning with applications Pub Date : 2024-04-16 DOI:10.1016/j.mlwa.2024.100550

Andrew S. Xiao , Qilian Liang

{"title":"利用机器学习方法检测 Youtube 视频评论中的垃圾信息","authors":"Andrew S. Xiao , Qilian Liang","doi":"10.1016/j.mlwa.2024.100550","DOIUrl":null,"url":null,"abstract":"<div><p>Machine Learning models have the ability to streamline the process by which Youtube video comments are filtered between legitimate comments (ham) and spam. In order to integrate machine learning models into regular usage on media-sharing platforms, recent approaches have aimed to develop models trained on Youtube comments, which have emerged as valuable tools for the classification and have enabled the identification of spam content and enhancing user experience. In this paper, eight machine learning approaches are applied to spam detection for YouTube comments. The eight machine learning models include Gaussian Naive Bayes, logistic regression, K-nearest neighbors (KNN) classifier, multi-layer perceptron (MLP), support vector machine (SVM) classifier, random forest classifier, decision tree classifier, and voting classifier. All eight models perform very well, specifically random forest approach can achieve almost perfect performance with average precision of 100% and AUC-ROC of 0.9841. The computational complexity of the eight machine learning approaches are compared.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"16 ","pages":"Article 100550"},"PeriodicalIF":0.0000,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827024000264/pdfft?md5=5244427dfd0f509334984878d01998e5&pid=1-s2.0-S2666827024000264-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Spam detection for Youtube video comments using machine learning approaches\",\"authors\":\"Andrew S. Xiao , Qilian Liang\",\"doi\":\"10.1016/j.mlwa.2024.100550\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Machine Learning models have the ability to streamline the process by which Youtube video comments are filtered between legitimate comments (ham) and spam. In order to integrate machine learning models into regular usage on media-sharing platforms, recent approaches have aimed to develop models trained on Youtube comments, which have emerged as valuable tools for the classification and have enabled the identification of spam content and enhancing user experience. In this paper, eight machine learning approaches are applied to spam detection for YouTube comments. The eight machine learning models include Gaussian Naive Bayes, logistic regression, K-nearest neighbors (KNN) classifier, multi-layer perceptron (MLP), support vector machine (SVM) classifier, random forest classifier, decision tree classifier, and voting classifier. All eight models perform very well, specifically random forest approach can achieve almost perfect performance with average precision of 100% and AUC-ROC of 0.9841. The computational complexity of the eight machine learning approaches are compared.</p></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"16 \",\"pages\":\"Article 100550\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666827024000264/pdfft?md5=5244427dfd0f509334984878d01998e5&pid=1-s2.0-S2666827024000264-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827024000264\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827024000264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

机器学习模型能够简化 Youtube 视频评论在合法评论（ham）和垃圾评论之间的过滤过程。为了将机器学习模型集成到媒体共享平台的常规使用中，最近的方法旨在开发针对 Youtube 评论训练的模型，这些模型已成为有价值的分类工具，能够识别垃圾内容并提升用户体验。本文将八种机器学习方法应用于 YouTube 评论的垃圾邮件检测。这八种机器学习模型包括高斯奈维贝叶、逻辑回归、K-近邻（KNN）分类器、多层感知器（MLP）、支持向量机（SVM）分类器、随机森林分类器、决策树分类器和投票分类器。所有八个模型的表现都非常出色，特别是随机森林方法几乎达到了完美的表现，平均精度为 100%，AUC-ROC 为 0.9841。比较了八种机器学习方法的计算复杂度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Spam detection for Youtube video comments using machine learning approaches

Machine Learning models have the ability to streamline the process by which Youtube video comments are filtered between legitimate comments (ham) and spam. In order to integrate machine learning models into regular usage on media-sharing platforms, recent approaches have aimed to develop models trained on Youtube comments, which have emerged as valuable tools for the classification and have enabled the identification of spam content and enhancing user experience. In this paper, eight machine learning approaches are applied to spam detection for YouTube comments. The eight machine learning models include Gaussian Naive Bayes, logistic regression, K-nearest neighbors (KNN) classifier, multi-layer perceptron (MLP), support vector machine (SVM) classifier, random forest classifier, decision tree classifier, and voting classifier. All eight models perform very well, specifically random forest approach can achieve almost perfect performance with average precision of 100% and AUC-ROC of 0.9841. The computational complexity of the eight machine learning approaches are compared.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days