Gazi Hasan Al Masud , Rejaul Islam Shanto , Ishmam Sakin , Muhammad Rafsan Kabir
{"title":"Effective depression detection and interpretation: Integrating machine learning, deep learning, language models, and explainable AI","authors":"Gazi Hasan Al Masud , Rejaul Islam Shanto , Ishmam Sakin , Muhammad Rafsan Kabir","doi":"10.1016/j.array.2025.100375","DOIUrl":null,"url":null,"abstract":"<div><div>Depression is an increasingly prevalent issue, particularly among young people, significantly impacting their well-being and causing persistent distress. Early detection is crucial to address this growing concern. This study utilizes various machine learning, deep learning, and language models to detect depression among Bangladeshi university students. To address data imbalance in the employed dataset, resampling techniques such as SMOTE and Cluster Centroids are applied. Additionally, exhaustive hyperparameter optimization is performed to enhance classification performance. Our results indicate that machine learning algorithms, particularly Random Forest, effectively predict depression with an accuracy of 91.1% and an F1-score of 91.6%. Language models like RoBERTa also achieve strong results, with a recall score of 98.6%. Moreover, explainable AI (XAI) methods, including SHAP and LIME, are employed to interpret model predictions, underscoring the importance of transparency in machine learning. This work contributes to the early identification of depression by integrating machine learning, deep learning, natural language processing, and XAI techniques. While this study focuses on Bangladeshi or similar demographic groups, the proposed approaches are adaptable and can be applied to other populations for generalization.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"25 ","pages":"Article 100375"},"PeriodicalIF":2.3000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625000025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Depression is an increasingly prevalent issue, particularly among young people, significantly impacting their well-being and causing persistent distress. Early detection is crucial to address this growing concern. This study utilizes various machine learning, deep learning, and language models to detect depression among Bangladeshi university students. To address data imbalance in the employed dataset, resampling techniques such as SMOTE and Cluster Centroids are applied. Additionally, exhaustive hyperparameter optimization is performed to enhance classification performance. Our results indicate that machine learning algorithms, particularly Random Forest, effectively predict depression with an accuracy of 91.1% and an F1-score of 91.6%. Language models like RoBERTa also achieve strong results, with a recall score of 98.6%. Moreover, explainable AI (XAI) methods, including SHAP and LIME, are employed to interpret model predictions, underscoring the importance of transparency in machine learning. This work contributes to the early identification of depression by integrating machine learning, deep learning, natural language processing, and XAI techniques. While this study focuses on Bangladeshi or similar demographic groups, the proposed approaches are adaptable and can be applied to other populations for generalization.