{"title":"Time and performance comparison on suicide detection using various feature engineering and machine learning models","authors":"Kittisak Thongsi, Nannaphas Booncherd, Pokpong Songmuang","doi":"10.1109/KST57286.2023.10086874","DOIUrl":null,"url":null,"abstract":"Today more people use social media to express their opinion and their emotions. There are many types of text in social media including text that convey a tendency to be depressed or suicidal. We use sentiment analysis to detect suicidal texts, because if detected, it could save many lives and many families. In this research, we have an objective to explore a method that is both high performance and less time-using. We design experiments that have 30 combinations between five machine learning models with six feature engineering methods. All experiments use accuracy and total time for model generation as metrics. We use deep neural networks with glove embedding as a comparator because this combination performed well in this dataset on Kaggle competition. From the experimental results, we find that the suitable combination that generates fast and has good accuracy is Random Forest with TF-IDF with 0.897 and 145 seconds.","PeriodicalId":351833,"journal":{"name":"2023 15th International Conference on Knowledge and Smart Technology (KST)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 15th International Conference on Knowledge and Smart Technology (KST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KST57286.2023.10086874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Today more people use social media to express their opinion and their emotions. There are many types of text in social media including text that convey a tendency to be depressed or suicidal. We use sentiment analysis to detect suicidal texts, because if detected, it could save many lives and many families. In this research, we have an objective to explore a method that is both high performance and less time-using. We design experiments that have 30 combinations between five machine learning models with six feature engineering methods. All experiments use accuracy and total time for model generation as metrics. We use deep neural networks with glove embedding as a comparator because this combination performed well in this dataset on Kaggle competition. From the experimental results, we find that the suitable combination that generates fast and has good accuracy is Random Forest with TF-IDF with 0.897 and 145 seconds.