用SMOTE、Tomek和SMOTE-Tomek观察不平衡数据文本预测女性日报上销售产品的用户

Bern Jonathan, P. Putra, Y. Ruldeviyani
{"title":"用SMOTE、Tomek和SMOTE-Tomek观察不平衡数据文本预测女性日报上销售产品的用户","authors":"Bern Jonathan, P. Putra, Y. Ruldeviyani","doi":"10.1109/IAICT50021.2020.9172033","DOIUrl":null,"url":null,"abstract":"Female Daily is a beauty platform that has social media application share users’ experiences of beauty by posting images and text in a post. Female Daily has terms of condition to not use the platform for selling in their post. Somehow, users of Female Daily sometimes use the platform for selling beauty products. Post of users in Female Daily records in Female Daily databases. In that data, there are imbalanced data about users’ posts that banned (minority class) and post that admin does not ban because it does not contain selling products (majority class). SMOTE and Tomek are techniques for handling imbalanced data by over-sampling and under-sampling techniques repeatedly to manage the data into balance. In this study, we want to evaluate the imbalanced data text in Female Daily using SMOTE, Tomek, and SMOTE-Tomek. Predicting algorithms that we will use are Support Vector Machine (SVM) and Logistic Regression (LR) using transform vector TF-IDF to evaluate the best methods to predict the users selling products on Female Daily. The results of this study show us the effect of SMOTE, Tomek, and SMOTE-Tomek to Precision-Recall in people selling products (majority class) is effects not quite high and also reducing the Precision-Recall, but for people selling products (minority class) is positives improvement. The highest results combination each metrics are; G-Mean combination SMOTE-Tomek with SVM, Precision to minority class combination of SMOTE with SVM, Recall to minority class combination of SMOTE with LR. Experimental results on this study indicate the usefulness of the using SMOTE or SMOTE-Tomek approach.","PeriodicalId":433718,"journal":{"name":"2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","volume":"4 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE, Tomek, and SMOTE-Tomek\",\"authors\":\"Bern Jonathan, P. Putra, Y. Ruldeviyani\",\"doi\":\"10.1109/IAICT50021.2020.9172033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Female Daily is a beauty platform that has social media application share users’ experiences of beauty by posting images and text in a post. Female Daily has terms of condition to not use the platform for selling in their post. Somehow, users of Female Daily sometimes use the platform for selling beauty products. Post of users in Female Daily records in Female Daily databases. In that data, there are imbalanced data about users’ posts that banned (minority class) and post that admin does not ban because it does not contain selling products (majority class). SMOTE and Tomek are techniques for handling imbalanced data by over-sampling and under-sampling techniques repeatedly to manage the data into balance. In this study, we want to evaluate the imbalanced data text in Female Daily using SMOTE, Tomek, and SMOTE-Tomek. Predicting algorithms that we will use are Support Vector Machine (SVM) and Logistic Regression (LR) using transform vector TF-IDF to evaluate the best methods to predict the users selling products on Female Daily. The results of this study show us the effect of SMOTE, Tomek, and SMOTE-Tomek to Precision-Recall in people selling products (majority class) is effects not quite high and also reducing the Precision-Recall, but for people selling products (minority class) is positives improvement. The highest results combination each metrics are; G-Mean combination SMOTE-Tomek with SVM, Precision to minority class combination of SMOTE with SVM, Recall to minority class combination of SMOTE with LR. Experimental results on this study indicate the usefulness of the using SMOTE or SMOTE-Tomek approach.\",\"PeriodicalId\":433718,\"journal\":{\"name\":\"2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"volume\":\"4 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IAICT50021.2020.9172033\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IAICT50021.2020.9172033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

女性日报是一个美容平台,它的社交媒体应用程序通过在帖子中发布图片和文字来分享用户的美容体验。《女性日报》有条件不使用该平台在其岗位上进行销售。不知何故,《女性日报》的用户有时会利用这个平台销售美容产品。在Female Daily数据库中记录的用户帖子。在这些数据中,存在关于被禁止的用户帖子(少数类别)和管理员因为不包含销售产品而没有禁止的帖子(多数类别)的不平衡数据。SMOTE和Tomek是处理不平衡数据的技术,通过重复的过采样和欠采样技术来管理数据达到平衡。在这项研究中,我们想用SMOTE、Tomek和SMOTE-Tomek来评估《女性日报》中的不平衡数据文本。我们将使用的预测算法是支持向量机(SVM)和使用变换向量TF-IDF的逻辑回归(LR)来评估预测女性日报上销售产品的用户的最佳方法。本研究的结果表明,SMOTE, Tomek和SMOTE-Tomek对销售产品的人(多数类)的Precision-Recall的影响不是很高,也降低了Precision-Recall,但对于销售产品的人(少数类)是积极的改善。每个指标的最高结果组合是;SMOTE- tomek与SVM的g均值组合、SMOTE与SVM的少数类组合的精度、SMOTE与LR的少数类组合的召回率。本研究的实验结果表明使用SMOTE或SMOTE- tomek方法是有用的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Observation Imbalanced Data Text to Predict Users Selling Products on Female Daily with SMOTE, Tomek, and SMOTE-Tomek
Female Daily is a beauty platform that has social media application share users’ experiences of beauty by posting images and text in a post. Female Daily has terms of condition to not use the platform for selling in their post. Somehow, users of Female Daily sometimes use the platform for selling beauty products. Post of users in Female Daily records in Female Daily databases. In that data, there are imbalanced data about users’ posts that banned (minority class) and post that admin does not ban because it does not contain selling products (majority class). SMOTE and Tomek are techniques for handling imbalanced data by over-sampling and under-sampling techniques repeatedly to manage the data into balance. In this study, we want to evaluate the imbalanced data text in Female Daily using SMOTE, Tomek, and SMOTE-Tomek. Predicting algorithms that we will use are Support Vector Machine (SVM) and Logistic Regression (LR) using transform vector TF-IDF to evaluate the best methods to predict the users selling products on Female Daily. The results of this study show us the effect of SMOTE, Tomek, and SMOTE-Tomek to Precision-Recall in people selling products (majority class) is effects not quite high and also reducing the Precision-Recall, but for people selling products (minority class) is positives improvement. The highest results combination each metrics are; G-Mean combination SMOTE-Tomek with SVM, Precision to minority class combination of SMOTE with SVM, Recall to minority class combination of SMOTE with LR. Experimental results on this study indicate the usefulness of the using SMOTE or SMOTE-Tomek approach.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Human Target Search and Detection using Autonomous UAV and Deep learning Refarming Performance Analysis of Mobile Broadband System in Indonesia Human Activity Recognition System using Smart Phone based Accelerometer and Machine Learning Analyzing Different Unstated Goal Constraints on Reinforcement Learning Algorithm for Reacher Task in the Robotic Scrub Nurse Application Gain Performance Analysis of A Parabolic Reflector Fed with A Rectangular Microstrip Array Antenna
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1